IN THE NEWS: Search results for PCL-R

Showing posts sorted by relevance for query PCL-R. Sort by date Show all posts

January 12, 2014

Putting the Cart Before the Horse: The Forensic Application of the SRA-FV

As the developers of actuarial instruments such as the Static-99R acknowledge that their original norms inflated the risk of re-offense for sex offenders, a brand-new method is cropping up to preserve those inflated risk estimates in sexually violent predator civil commitment trials. The method introduces a new instrument, the “SRA-FV,” in order to bootstrap special “high-risk” norms on the Static-99R. Curious about the scientific support for this novel approach, I asked forensic psychologist and statistics expert Brian Abbott to weigh in.

Guest post by Brian Abbott, PhD*

NEWS FLASH: Results from the first peer-reviewed study about the Structured Risk Assessment: Forensic Version (“SRA-FV”), published in Sexual Abuse: Journal of Research and Treatment (“SAJRT”), demonstrate the instrument is not all that it’s cracked up to be.

Promotional material for an SRA-FV training

For the past three years, the SRA-FV developer has promoted the instrument for clinical and forensic use despite the absence of peer-reviewed, published research supporting it validity, reliability, and generalizability. Accordingly, some clinicians who have attended SRA-FV trainings around the country routinely apply the SRA-FV in sexually violent predator risk assessments and testify about its results in court as if the instrument has been proven to measure what it intends to assess, has known error rates, retains validity when applied to other groups of sexual offenders, and produces trustworthy results.

Illustrating this rush to acceptance most starkly, within just three months of its informal release (February 2011) and with an absence of any peer-reviewed research, the state of California incredibly decided to adopt the SRA-FV as its statewide mandated dynamic risk measure for assessing sexual offenders in the criminal justice system. This decision was rescinded in September 2013, with the SRA-FV replaced with a similar instrument, the Stable-2007.

The SRA-FV consists of 10 items that purportedly measure “long-term vulnerabilities” associated with sexual recidivism risk. The items are distributed among three risk domains and are assessed using either standardized rating criteria devised by the developer or by scoring certain items on the Psychopathy Checklist-Revised (PCL-R). Scores on the SRA-FV range from zero to six. Some examples of the items from the instrument include: sexual interest in children, lack of emotionally intimate relationships with adults, callousness, and internal grievance thinking. Patients from the Massachusetts Treatment Center in Bridgewater, Massachusetts who were evaluated as sexually dangerous persons between 1959 and 1984 served as members of the SRA-FV construction group (unknown number) and validation sample (N = 418). It was released for use by Dr. David Thornton, a co-developer of the Static-99R, Static-2002R, and SRA-FV and research director at the SVP treatment program in Wisconsin, in December 2010 during training held in Atascadero, California. Since then, Dr. Thornton has held similar trainings around the nation where he asserts that the SRA-FV is valid for predicting sexual recidivism risk, achieves incremental validity over the Static-99R, and can be used to choose among Static-99R reference groups.

A primary focus of the trainings is a novel system in which the total score on the SRA-FV is used to select one Static-99R “reference group” among three available options. The developer describes the statistical modeling underlying this procedure, which he claims increases predictive validity and power over using the Static-99R alone. However, reliability data is not offered to support this claim. In the December 2010 training, several colleagues and I asked for the inter-rater agreement rate but Dr. Thornton refused to provide it.

I was astounded but not surprised when some government evaluators in California started to apply the SRA-FV in sexually violent predator risk assessments within 30 days after the December 2010 training. This trend blossomed in other jurisdictions with sexually violent predator civil confinement laws. Typically, government evaluators applied the SRA-FV to select Static-99R reference groups, invariably choosing to compare offenders with the “High Risk High Needs” sample with the highest re-offense rates. A minority of clinicians stated in reports and court testimony that the SRA-FV increased predictive accuracy over the Static-99R alone but they were unable to quantify this effect. The same clinicians have argued that the pending publication of the Thornton and Knight study was sufficient to justify its use in civil confinement risk assessments for sexually violent predators. They appeared to imply that the mere fact that a construction and validation study had been accepted for publication was an imprimatur that the instrument was reliable and valid for its intended purposes. Now that the research has been peer-reviewed and published, the results reflect that these government evaluators apparently put the proverbial cart before the horse.

David Thornton and Raymond Knight penned an article that documents the construction and validation of the SRA-FV. The publication is a step in the right direction, but by no means do the results justify widespread application of the SRA-FV in sexual offender risk assessment in general or sexually violent predator proceedings in particular. Rather, the results of the study only apply to the group upon which the research was conducted and do not generalize to other groups of sexual offenders. Before discussing the limitations of the research, I would like to point out some encouraging results.

The SRA-FV did, as its developer claimed, account for more sources of sexual recidivism risk than the Static-99R alone. However, it remains unknown which of the SRA-FV’s ten items contribute to risk prediction. The study also found that the combination of the Static-99R and SRA-FV increased predictive power. This improved predictive accuracy, however, must be replicated to determine whether the combination of the two instruments will perform similarly in other groups of sexual offenders. This is especially important when considering that the SRA-FV was constructed and validated on individuals from the Bridgewater sample from Massachusetts who are not representative of contemporary groups of sexual offenders. Thornton and Knight concede this point when discussing how the management of sexual offenders through all levels of the criminal justice system in Massachusetts between 1959 and 1984 was remarkably lenient compared to contemporary times. Such historical artifacts likely compromise any reliable generalization from patients at Bridgewater to present-day sexual offenders.


Training materials presented four months before State of California rescinded use of the SRA-FV

Probably the most crucial finding from the study is the SRA-FV’s poor inter-rater reliability. The authors categorize the 64 percent rate of agreement as “fair.” It is well known that inter-rater agreement in research studies is typically higher than in real-world applications. This has been addressed previously in this blog in regard to the PCL-R. A field reliability study of the SRA-FV among 19 government psychologists rating 69 sexually violent predators in Wisconsin (Sachsenmaier, Thornton, & Olson, 2011) found an inter-rater agreement rate of only 55 percent for the SRA-FV total score, which is considered as poor reliability. These data illustrate that 36 percent to 45 percent of an SRA-FV score constitutes error, raising serious concerns over the trustworthiness of the instrument. To their credit, Thornton and Knight acknowledge this as an issue and note that steps should be taken to increase reliable scoring. Nonetheless, the current inter-rater reliability falls far short of the 80 percent floor recommended for forensic practice (Heilbrun, 1992). Unless steps are taken to dramatically improve reliability, the claims that the SRA-FV increases predictive accuracy either alone or in combination with the Static-99R, and that it should be used to select Static-99R reference groups, are moot.

It is also important to note that, although Thornton and Knight confuse the terms validation and cross validation in their article, this study represents a validation methodology. Cross-validation is a process by which the statistical properties found in a validation sample (such as reliability, validity, and item correlations) are tested in a separate group to see whether they hold up. In contrast, Thornton and Knight first considered the available research data from a small number of individuals from the Bridgewater group to determine what items would be included in the SRA-FV. This group is referred to as the construction sample. The statistical properties of the newly conceived measure were studied on 418 Bridgewater patients who constitute the validation sample. The psychometric properties of the validation group have not been tested on other contemporary sexual offender groups. Absent such cross-validation studies, we simply have no confidence that the SRA-FV works at it has been designed for groups other than the sample upon which it was validated. To their credit, Thornton and Knight acknowledge this limitation and warn readers not to generalize the validation research to contemporary groups of sexual offenders.

The data on incremental predictive validity, while interesting, have little practical value at this point for two reasons. One, it is unknown whether the results will replicate in contemporary groups of sexual offenders. Two, no data are provided to quantify the increased predictive power. The study does not provide an experience table of probability estimates at each score on the Static-99R after taking into account the effect of the SRA-FV scores. It seems disingenuous, if not misleading, to inform the trier of fact that the combined measures increase predictive power but to fail to quantify the result and the associated error rate.

In my practice, I have seen the SRA-FV used most often to select among three Static-99R reference groups. Invariably, government evaluators in sexually violent predator risk assessments assign SRA-FV total scores consistent with the selection of the Static-99R High Risk High Needs reference group. Only the risk estimates associated with the highest Static-99R scores in this reference group are sufficient to support an opinion that an individual meets the statutory level of sexual dangerousness necessary to justify civil confinement. Government evaluators who have used the SRA-FV for this purpose cannot cite research demonstrating that the procedure works as intended or that it produces a reliable match to the group representing the individual being assessed. Unfortunately, Thornton and Knight are silent on this application of the SRA-FV.

In a recently published article, I tested the use of the SRA-FV for selecting Static-99R reference groups. In brief, Dr. Thornton used statistical modeling based solely on data from the Bridgewater sample to devise this model. The reference group selection method was not based on the actual scores of members from each of the three reference groups. Rather, it was hypothetical, presuming that members of a Static-99R reference group will exhibit a certain range of SRA-FV score that do not overlap with any of the other two reference groups. To the contrary, I found that the hypothetical SRA-FV reference group system did not work as designed, as the SRA-FV scores between reference groups overlapped by wide margins. In other words, the SRA-FV total score would likely be consistent with selecting two if not all three Static-99R reference groups. In light of these findings, it is incumbent upon the developer to provide research using actual subjects to prove that the SRA-FV total score is a valid method by which to select a single Static-99R reference group and that the procedure can be applied reliably. At this point, credible support does not exist for using the SRA-FV to select Static-99R reference groups.

The design, development, validation, and replication of psychological instruments is guided by the Standard for Educational and Psychological Testing (“SEPT” -- American Educational Research Association et al., 1999). When comparing the Thornton and Knight study to the framework provided by SEPT, it is apparent the SRA-FV is in the infancy stage of development. At best, the SRA-FV is a work in progress that needs substantially more research to improve its psychometric properties. Aside from its low reliability and inability to generalize the validation research to other groups of sexual offenders, other important statistical properties await examination, including but not limited to:

standard error of measurement
factor analysis of whether items within each of the three risk domains significantly load in their respective domains
the extent of the correlation between each SRA-FV item and sexual recidivism
which SRA-FV items add incremental validity beyond the Static-99R or may be redundant with it; and proving each item has construct validity.

It is reasonable to conclude that at its current stage of development the use of the SRA-FV in forensic proceedings is premature and scientifically indefensible. In closing , in their eagerness to improve the accuracy of their risk assessments, clinicians relied upon Dr. Thornton’s claim in the absence of peer-reviewed research demonstrating that the SRA-FV achieved generally accepted levels of reliability and validity. The history of forensic evaluators deploying the SRA-FV before the publication of the construction and validation study raises significant ethical and legal questions:

Should clinicians be accountable to vet the research presented in trainings by an instrument’s developer before applying a tool in forensic practice?

What responsibility do clinicians have to rectify testimony where they presented the SRA-FV as if the results were reliable and valid?

How many individuals have been civilly committed as sexually violent predators based on testimony that the findings from the SRA-FV were consistent with individuals meeting the legal threshold for sexual dangerousness, when the published data does not support this conclusion?

Answers to these questions and others go beyond the scope of this blog. However, in a recent appellate decision, a Washington Appeals Court questions the admissibility of the SRA-FV in the civil confinement trial of Steven Ritter. The appellate court determined that the application of the SRA-FV was critical to the government evaluator’s opinion that Mr. Ritter met the statutory threshold for sexual dangerousness. Since the SRA-FV is considered a novel scientific procedure, the appeals court reasoned that the trial court erred by not holding a defense-requested evidentiary hearing to decide whether the SRA-FV was admissible evidence for the jury to hear. The appeals court remanded the issue to the trial court to hold a Kelly-Frye hearing on the SRA-FV. Stay tuned!

References

Abbott, B.R. (2013). The Utility of Assessing “External Risk Factors” When Selecting Static-99R Reference Groups. Open Access Journal of Forensic Psychology, 5, 89-118.

American Educational Research Association, American Psychological Association and National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.

Heilbrun, K. (1992). The role of psychological testing in forensic assessment. Law and Human Behavior, 16, 257-272. doi: 10.1007/BF01044769.

In Re the Detention of Steven Ritter. (2013, November). In the Appeals Court of the State of Washington, Division III.

Sachsenmaier, S., Thornton, D., & Olson, G. (2011, November). Structured risk assessment forensic version (SRA-FV): Score distribution, inter-rater reliability, and margin of error in an SVP population. Presentation at the 30th Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, Toronto, Canada.

Thornton, D. & Knight, R.A. (2013). Construction and validation of the SRA-FV Need Assessment. Sexual Abuse: A Journal of Research and Treatment. Published online December 30, 2013. doi: 10.1177/ 1079063213511120.

* * *

*Brian R. Abbott is licensed psychologist in California and Washington who has evaluated and treated sexual offenders for more than 35 years. Among his areas of forensic expertise, Dr. Abbott has worked with sexually violent predators in various jurisdictions within the United States, where he performs psychological examinations, trains professionals, consults on psychological and legal issues, offers expert testimony, and publishes papers and peer-reviewed articles.

June 17, 2010

Psychopathy brouhaha: It's a wrap (I hope!)

Today's Scientific American has more on the censorship controversy I've featured here in recent weeks. As regular readers know, the flap centers around allegations that psychopathy researcher Robert Hare tried to silence critics by threatening to sue. The controversial article was finally published this month in the American Psychological Association publication Psychological Assessment, but the fallout continues.

The column by J.R. Minkel, oddly titled "Fear Review," features a rundown, including commentary by prominent scholar Stephen Hart:

People familiar with the matter say the scale's author, Robert Hare of the University of British Columbia, deserves only partial blame for the delay, to be shared with the American Psychological Association (APA), the journal's publisher. But they say Hare's use of legal threats has at best subverted the peer review process that is the crux of modern scientific progress, and could at worst encourage junior researchers in the field of forensic psychology to pursue other lines of research.

"I find this action to be completely inconsistent with the man I had [great] respect and affection for," says Stephen Hart of Simon Fraser University in British Columbia, a collaborator and former student of Hare's. "People I speak with automatically think, 'Well, what's in that article that makes him so upset? What's he so afraid of?'

After reading all of the publicly available materials on the controversy, as well as numerous email posts on professional listservs, here's how I boil things down to the essence:

The Skeem and Cooke article is an important scientific analysis of the theoretical construct of psychopathy, which is increasingly being used as a weapon in court with grave consequences for those it is deployed against.

Not surprisingly, Robert Hare disagrees with Skeem and Cooke. Specifically, he does not agree with their claim that his Psychopathy Checklist or the underlying psychopathy construct centralizes criminality.

Hare claims that Skeem and Cooke distorted his work. In a written response, he gives three examples of alleged distortions. Presumably, since he was preparing his response for publication, he picked the best examples he could find to illustrate his complaint. Yet, these are nowhere near as egregious as I had imagined they would be, given his threat to sue.

Hare accuses two well respected psychology-law leaders, Norm Poythress and John Petrila, of being biased and misinformed. But nothing in his response supports this. Poythress and Petrila, in their article in the International Journal of Forensic Mental Health that set this whole ball in motion, were careful not to take sides in the underlying scientific debate over psychopathy. Rather, they focused on the threat to academic freedom and science posed by threats to sue: "Academic freedom rests on the premise that advances in science can only occur if scholars are permitted to pursue free competition among ideas. This assumes that scholars have the liberty to do their work free from limitations imposed by political or religious pressure or by economic reprisals."

Hare has claimed elsewhere that his "lawsuit threat was meant only to get the 'attention' of APA, Skeem, and Cooke and force changes to the article." In his essay, he expresses bafflement at the ensuing, lengthy delay in the article's publication. To claim that his threat to sue did not contribute to the lengthy delay is either disingenuous or naïve. Especially in the wake of other controversies, such as the Rind debacle in which the U.S. Congress blasted the APA's publication and peer review process, the Association is undoubtedly very gun-shy and reactive over lawsuit threats.

The bottom line:

After analyzing all sides of the issue, I find that the Skeem and Cooke article is an important and timely contribution to the field, and that threats to sue over such publications set a dangerous precedent. As Poythress and Petrila point out in their commentary, potential negative effects of defamation threats against scientific researchers include -- among other things -- that:

researchers avoid critical research out of fear of lawsuits,
academics avoid volunteering as peer reviewers, and
journal editors self-censor on controversial topics

Censorship -- or even the appearance of censorship -- is especially dangerous when it involves critique of a construct that may be used in a partisan manner in the forensic arena.

Hare is entitled to express his opinion, but nothing in his public response changes these bottom lines. Rather, as Jennifer Skeem notes in today's Scientific American piece, all of this peripheral controversy distracts from the scientific critique of psychopathy, including her critique that was silenced for three years before finally seeing the light of day.

I sure hope this is my last blog post for a while on this topic!

PRIVATE NOTE TO TODAY'S "ANONYMOUS" BLOG COMMENTER:
I regret that I had to reject your comment about the pecuniary angle from publication.
While I found it quite interesting, I had no easy way to substantiate its accuracy.

GENERAL NOTE TO COMMENTERS:
I encourage comments, but it's nice to know who is talking;
please consider signing your name (or at least a pseudonym).

My prior coverage includes:

Psychopath guru blocks critical article: Will case affect credibility of PCL-R test in court? (May 30, 2010)
More coverage of psychopathy censorship controversy (June 1, 2010)
Psychopathy controversy goes primetime (June 10, 2010)
New York Times covers psychopathy debacle ( June 12, 2010)

June 18, 2010

New study on juvenile sex offender treatment

Efficacy claimed, but control group questionable

I previously reviewed forensic psychologist Frank DiCataldo's excellent book on juvenile sex offenders, in which he claims there is very little good research to show that sex offender-specific treatment is useful with adolescents. As DiCataldo points out, mo

st of the studies (and the meta-analyses of studies) purporting to show a treatment effect have serious flaws. One big problem is the lack of control groups. This makes it impossible to know whether juveniles who did well after sex-offender treatment would have done equally well without treatment or with generic treatment not focused on sex offending.

But now, a study out of Canada (where else?!) is being trumpeted as methodologically sound proof that -- drum roll here -- treatment works. The study followed 148 adolescents for up to 20 years, which is a very long time for this type of research. Overall, only 17 of the subjects (about 11.5%) picked up a new sex offense as adults, with another 7 getting a new charge only in adolescence, for a total of 24 recidivists (16%). That's in line with a growing body of data on the very low recidivism rates of juvenile sex offenders, ranging from about 4% to 15%.

When they broke it down by those who underwent their specialized treatment, compared with a comparison group that did not, they found that only 9% (5 out of 58) of the youths who had gone through their program got charged with a new sex offense over the next 20 years, compared with 21% (19 out of 90) of those who had not. That's a significant difference.

But here's the rub. The participants were not randomly assigned to treatment (versus no treatment), which is how it's supposed to be done in psychotherapy treatment outcome research. Otherwise, you never know if there is something about the selection process that affected the results. Not only was assignment to groups not random, but the researchers put the kids who refused to undergo treatment, along with those who dropped out of treatment early, into their so-called "control" group! In fact, these bad boys comprised fully half of the non-treatment group. My guess is that these dropouts and refusers were probably a whole lot more delinquent than the other kids in the first place.

So it's possible that what the research really shows is not that treatment works, but that hardcore delinquents who refuse or drop out of treatment are likely to get into more trouble in their later teens and early 20s. It would be interesting to see if the control group still showed a higher recidivism rate if they removed the dropouts and refusers from the analyses. In fact, I would love to see some qualitative analyses of who those 19 recidivists (out of the total of 90 in the control group) are.

Another potential confounder I noticed was that many of the adolescents in the non-treatment control group were apparently in some different kind of treatment at the time. That treatment is not described, so perhaps this study is more of a between-treatments design, rather than a study of treatment versus no treatment.

Again, as I've discussed previously, part of the "problem" both with accurately predicting which juveniles will reoffend and also with designing treatment programs that work is the very low overall rate of recidivism among juveniles who have sexually offended. As DiCataldo and others have pointed out, if you just predict that no juvenile caught for a sex crime will reoffend, you will be correct in the broad majority of cases.

Flattening trajectory

The study, by psychologist James Worling and his colleagues at the Sexual Abuse: Family Education & Treatment Program (SAFE-T) program in Ontario, did have some other interesting findings. The researchers found that most recidivism -- both sexual and nonsexual -- happens within the first few years. Offending flattens out significantly at about the 10-year mark, when folks hit about the age of 25. This is consistent with the recent study by Lussier and colleagues of sex offender trajectories, as well as the general criminology literature on desistance. Crime, including sex offending, is a young man's game.

The finding that only 11.5% of the participants were charged with subsequent sexual offenses as adults is also in line with other research showing very low recidivism for juveniles. A strength of this study is its long course; it followed kids all the way up to an average age of 31.

The study is: Worling, J.R., Littlejohn, A., & Bookalam, D. (2010). 20-year prospective follow-up study of specialized treatment for adolescents who offended sexually. Behavioral Sciences and the Law, 28, 46-57.

Special journal issue on adolescent sex offenders

NOTE: It looks like this entire special issue of Behavioral Sciences and the Law is available online. Other interesting articles in the issue include:

Inter-rater reliability of the PCL-R total and factor scores among psychopathic sex offenders: are personality features more prone to disagreement than behavioral features?
John F. Edens, Marcus T. Boccaccini, Darryl W. Johnson

Searching for the developmental origins of sexual violence: examining the co-occurrence of physical aggression and sexual behaviors in early childhood
Patrick Lussier, Jay Healey

Assessing risk of sexually abusive behavior among youth in a child welfare sample
Robert A. Prentky, Nien-Chen Li, Sue Righthand, Ann Schuler, Deborah Cavanaugh, Austin F. Lee

Psychological mechanisms underlying support for juvenile sex offender registry laws: prototypes, moral outrage, and perceived threat
Jessica M. Salerno, Cynthia J. Najdowski, Margaret C. Stevenson, Tisha R. A. Wiley, Bette L. Bottoms, Roberto Vaca Jr., Pamela S. Pimentel

Legal, ethical, and methodological considerations in the Internet-based study of child pornography offenders (p 84-105)
James V. Ray, Eva R. Kimonis, Christine Donoghue

April 19, 2015

Static-99: A bumpy developmental path

By Brian Abbott, PhD and Karen Franklin, PhD*

The Static-99 is the most widely used instrument for assessing sex offenders’ future risk to the public. Indeed, some state governments and other agencies even mandate its use. But bureaucratic faith may be misplaced. Conventional psychological tests go through a standard process of development, beginning with the generation and refinement of items and proceeding through set stages that include pilot testing and replication, leading finally to peer review and formal publication. The trajectory of the Static-99 has been more haphazard: Since its debut 15 years ago, the tool has been in a near-constant state of flux. Myriad changes in items, instructions, norms and real-world patterns of use have cast a shadow over its scientific validity. Here, we chart the unorthodox developmental course of this tremendously popular tool.

Static-99 and 99R Developmental Timeline

Date	Event
1990	The first Sexually Violent Predator (SVP) law passes in the United States, in Washington. A wave of similar laws begins to sweep the nation.
1997	The US Supreme Court upholds the Constitutionality of preventive detention of sex offenders.
1997	R. Karl Hanson, a psychologist working for the Canadian prison system, releases a four-item tool to assess sex offender risk. The Rapid Risk Assessment for Sex Offence Recidivism (RRASOR) uses data from six settings in Canada and one in California.[1]
1998	Psychologists David Thornton and Don Grubin of the UK prison system release a similar instrument, the Structured Anchored Clinical Judgment (SACJ- Min) scale.[2]
1999	Hanson and Thornton combine the RRASOR and SACJ-Min to produce the Static-99, which is accompanied by a three-page list of coding rules.[3] The instrument's original validity data derive from four groups of sex offenders, including three from Canada and one from the UK (and none from the United States). The new instrument is atheoretical, with scores interpreted based on the recidivism patterns among these 1,208 offenders, most of them released from prison in the 1970s.
2000	Hanson and Thornton publish a peer-reviewed article on the new instrument.[4]
2003	New coding rules are released for the Static-99, in an 84-page, unpublished booklet that is not peer reviewed.[5] The complex and sometimes counterintuitive rules may lead to problems with scoring consistency, although research generally shows the instrument can be scored reliably.
2003	The developers release a new instrument, the Static-2002, intended to "address some of the weaknesses of Static-99."[6] The new instrument is designed to be more logical and easier to score; one item from the Static-99 – pertaining to whether the subject had lived with a lover for at least two years – was dropped due to issues with its reliability and validity. Despite its advantages, Static-2002 never caught on, and did not achieve the popularity of the Static-99 in forensic settings.
2007	Leslie Helmus, A graduate student working with Karl Hanson, reports that contemporary samples of sex offenders have much lower offense rates than did the antiquated, non-US samples upon which the Static-99 was originally developed, both in terms of base rates of offending and rates of recidivism after release from custody.[7]
September 2008	Helmus releases a revised actuarial table for Static-99, to which evaluators may compare the total scores of their subjects to corresponding estimates of risk.[8] Another Static-99 developer, Amy Phenix, releases the first of several "Evaluators’ Handbooks."[9]
October 2008	At an annual convention of the Association for the Treatment of Sexual Abusers (ATSA), Andrew Harris, a Canadian colleague of Hanson's, releases a new version of the Static-99 with three separate "reference groups" (Complete, CSC and High Risk) to which subjects can be compared. Evaluators are instructed to report a range of risks for recidivism, with the lower bound coming from a set of Canadian prison cases (the so-called CSC, or Correctional Service of Canada group), and the upper bound derived from a so-called "high-risk" group of offenders. The risk of the third, or "Complete," group was hypothesized as falling somewhere between those of the other two groups.[10]
November 2008	At a workshop sponsored by a civil commitment center in Minnesota, Thornton and a government evaluator named Dennis Doren propose yet another new method of selecting among the new reference groups. In a procedure called "cohort matching,” they suggest comparing an offender with either the CSC or High-Risk reference group based on how well the subject matched a list of external characteristics they had created but never empirically tested or validated.[11]
December 2008	Phenix and California psychologist Dale Arnold put forth yet a new idea for improving the accuracy of the Static-99: After reporting the range of risk based on a combination of the CSC and High-Risk reference groups, evaluators are encouraged to consider a set of external factors, such as whether the offender had dropped out of treatment and the offender's score on Robert Hare's controversial Psychopathy Checklist-Revised (PCL-R). This new method does not seem to catch on.[12] [13]
2009	An official Static-99 website, www.static99.org, debuts.[14]
Winter 2009	The Static-99 developers admit that norms they developed in 2000 are not being replicated: The same score on the Static-99 equates with wide variations in recidivism rates depending on the sample to which it is compared. They theorize that the problem is due to large reductions in Canadian and U.S. recidivism rates since the 1970s-1980s. They call for the development of new norms.[15]
September 2009	Hanson and colleagues roll out a new version of the Static-99, the Static-99R.[16] The new instrument addresses a major criticism by more precisely considering an offender's age at release, an essential factor in reoffense risk. The old Static-99 norms are deemed obsolete. They are replaced by data from 23 samples collected by Helmus for her unpublished Master's thesis. The samples vary widely in regard to risk. For estimating risk, the developers now recommend use of the cohort matching procedure to select among four new reference group options. They also introduce the concepts of percentile ranks and relative risk ratios, along with a new Evaluators’ Workbook for Static-99R and Static-2002R. Instructions for selecting reference groups other than routine corrections are confusing and speculative. Research is lacking to demonstrate that selecting other than routine corrections reference group produces more accurate risk estimates.[17]
November 2009	Just two months after their introduction, the Evaluators’ Workbook for Static-99R and Static-2002R is withdrawn due to errors in its actuarial tables.[18] The replacement workbook provides the same confusing and speculative method for selecting a nonroutine reference group, a method that lacks scientific validation and reliability.
2010	An international team of researchers presents large-scale data from the United States, New Zealand and Australia indicating that the Static-99 would be more accurate if it took better account of an offender's age.[19] The Static-99 developers do not immediately embrace these researchers' suggestions.
January 2012	Amy Phenix and colleagues introduce a revised Evaluators’ Workbook for Static-99R and Static-2002R.[20] The new manual makes a number of revisions both to the underlying data (including percentile rank and relative risk ratio data) and to the recommended procedure for selecting a reference group. Now, in an increasingly complex procedure, offenders are to be compared to one of three reference groups, based on how many external risk factors they had. The groups included Routine Corrections (low risk), Preselected Treatment Need (moderate risk), and Preselected High Risk Need (high risk). Subsequent research shows that using density of external risk factors to select among the three reference group options is not valid and has no proven reliability.[21]A fourth reference group, Nonroutine Corrections, may be selected using a separate cohort-matching procedure. New research indicates that evaluators who are retained most often by the prosecution are more likely than others to select the high-risk reference group, [22] which has base rates much higher than in contemporary sexual recidivism studies and will thus produce exaggerated risk estimates.[23]
July 2012	Six months later, the percentile ranks and relative risk ratios are once again modified, with the issuance of the third edition of the Static-99R and Static-2002R Evaluators’ Handbook.[24] No additional data is provided to justify that the selection of nonroutine reference groups produces more accurate risk estimates than choosing the routine corrections reference group.
October 2012	In an article published in Criminal Justice & Behavior, the developers concede that risk estimates for the 23 offender samples undergirding the Static-99 vary widely. Further, absolute risk levels for typical sex offenders are far lower than previously reported, with the typical sex offender having about a 7% chance of committing a new sex offense within five years. They theorize that the Static-99 might be inflating risk of reoffense due to the fact that the offenders in its underlying samples tended to be higher risk than average.[25]
2012	The repeated refusal of the Static-99 developers to share their underlying data with other researchers, so that its accuracy can be verified, leads to a court order excluding use of the instrument in a Wisconsin case.[26]
October 2013	At an annual ATSA convention, Hanson and Phenix report that an entirely new reference group selection system will be released in a peer-reviewed article in Spring 2014.[27] The new system will include only two reference groups: Routine Corrections and Preselected High Risk High Need. An atypical sample of offenders from a state hospital in Bridgewater, Massachusetts dating back to 1958 is to be removed altogether, along with some other samples, while some new data sets are to be added.
October 2014	At the annual ATSA convention, the developers once again announce that the anticipated rollout of the new system has been pushed back pending acceptance of the manuscript for publication. Helmus nonetheless presents an overview.[28] She reports that the new system will abandon two out of the current four reference groups, retaining only Routine Corrections and Preselected High Risk Need. Evaluators should now use the Routine Corrections norms as the default unless local norms (with a minimum of 100 recidivists) are available. Evaluators will be permitted to choose the Preselected High Risk Need norms based on “strong, case-specific justification.” No specific guidance nor empirical evidence to support such a procedure is proffered. A number of other new options for reporting risk information are also presented, including the idea of combining Static-99 data with that from newly developed, so-called "dynamic risk instruments."
January 2015	At an ATSA convention presentation followed by an article in the journal Sexual Abuse,[29] the developers announce further changes in their data sets and how Static-99R scores should be interpreted. Only two of the original four "reference groups" are still standing. Of these, the Routine group has grown by 80% (to 4,325 subjects), while the High-Risk group has shrunk by 35%, to a paltry 860 individuals. Absent from the article is any actuarial table on the High-Risk group, meaning the controversial practice by some government evaluators of inflating risk estimates by comparing sex offenders' Static-99R scores with the High-Risk group data has still not passed any formal peer review process. The developers also correct a previous statistical method as recommended by Ted Donaldson and colleagues back in 2012,[30] the effect of which is to further lower risk estimates in the high-risk group. Only sex offenders in the Routine group with Static-99R scores of 10 are now statistically more likely than not to reoffend. It is unknown how many sex offenders were civilly committed in part due to reliance on the now-obsolete data.

References

[1] Hanson, R. K. (1997). The development of a brief actuarial risk scale for sexual offense recidivism. (Unpublished report 97-04). Ottawa: Department of the Solicitor General of Canada.

[2] Grubin, D. (1998). Sex offending against children: Understanding the risk. Unpublished report, Police Research Series Paper 99. London: Home Office.

[3] Hanson, R.K. & Thornton, D. (1999). Static 99: Improving Actuarial Risk Assessments for Sex Offenders. Unpublished paper

[4] Hanson, R. K., & Thornton, D. (2000). Improving risk assessments for sex offenders: A comparison of three actuarial scales. Law and Human Behavior, 24(1), 119-136.

[5] Harris, A. J. R., Phenix, A., Hanson, R. K., & Thornton, D. (2003). Static-99 coding rules: Revised 2003. Ottawa, ON: Solicitor General Canada.

[6] Hanson, R.K., Helmus, L., & Thornton, D (2010). Predicting recidivism amongst sexual offenders: A multi-site study of Static-2002. Law & Human Behavior 34, 198-211.

[7] Helmus, L. (2007). A multi-site comparison of the validity and utility of the Static-99 and Static-2002 for risk assessment with sexual offenders. Unpublished Honour’s thesis, Carleton University, Ottawa, ON, Canada.

[8] Helmus, L. (2008, September). Static-99 Recidivism Percentages by Risk Level. Last Updated September 25, 2008. Unpublished paper.

[9] Phenix, A., Helmus, L., & Hanson, R.K. (2008, September). Evaluators’ Workbook. Unpublished, September 28, 2008

[10] Harris, A. J. R., Hanson, K., & Helmus, L. (2008). Are new norms needed for Static-99? Workshop presented at the ATSA 27th Annual Research and Treatment Conference on October 23, 2008, Atlanta: GA. Available at www.static99.org.

[11] Doren, D., & Thornton, D. (2008). New Norms for Static-99: A Briefing. A workshop sponsored by Sand Ridge Secure Treatment Center on November 10, 2008. Madison, WI.

[12] Phenix, A. & Arnold, D. (2008, December). Proposed Considerations for Conducting Sex Offender Risk Assessment Draft 12-14-08. Unpublished paper.

[13] Abbott, B. (2009). Applicability of the new Static-99 experience tables in sexually violent predator risk assessments. Sexual Offender Treatment, 1, 1-24.

[14] Helmus, L., Hanson, R. K., & Thornton, D. (2009). Reporting Static-99 in light of new research on recidivism norms. The Forum, 21(1), Winter 2009, 38-45.

[15] Ibid.

[16] Hanson, R. K., Phenix, A., & Helmus, L. (2009, September). Static-99(R) and Static-2002(R): How to Interpret and Report in Light of Recent Research. Paper presented at the 28th Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, Dallas, TX, September 28, 2009.

[17] DeClue, G. & Zavodny, D. (2014). Forensic use of the Static-99R: Part 4. Risk Communication. Journal of Threat Assessment and Management, 1(3), 145-161.

[18] Phenix, A., Helmus, L., & Hanson, R.K. (2009, November). Evaluators’ Workbook. Unpublished, November 3, 2009.

[19] Wollert, R., Cramer, E., Waggoner, J., Skelton, A., & Vess, J. (2010). Recent Research (N = 9,305) Underscores the Importance of Using Age-Stratified Actuarial Tables in Sex Offender Risk Assessments. Sexual Abuse: A Journal of Research and Treatment, 22 (4), 471-490. See also: "Age tables improve sex offender risk estimates," In the News blog, Dec. 1, 2010.

[20] Phenix, A., Helmus, L., & Hanson, R.K. (2012, January). Evaluators’ Workbook. Unpublished, January 9, 2012.

[21] Abbott, B.R. (2013). The Utility of Assessing “External Risk Factors” When Selecting Static-99R Reference Groups. Open Access Journal of Forensic Psychology, 5, 89-118.

[22] Chevalier, C., Boccaccini, M. T., Murrie, D. C. & Varela, J. G. (2014), Static-99R Reporting Practices in Sexually Violent Predator Cases: Does Norm Selection Reflect Adversarial Allegiance? Law & Human Behavior. To request a copy from the author, click HERE.

[23] Abbott (2013) op. cit.

[24] Phenix, A., Helmus, L., & Hanson, R.K. (2012, July). Evaluators’ Workbook. Unpublished, July 26, 2012.

[25] Helmus, Hanson, Thornton, Babchishin, & Harris (2012), Absolute recidivism rates predicted by Static-99R and Static-2002R sex offender risk assessment tools vary across samples: A meta-analysis, Criminal Justice & Behavior. See also: "Static-99R risk estimates wildly unstable, developers admit," In the News blog, Oct. 18, 2012.

[26] State of Wisconsin v. Homer L. Perren Jr., La Crosse County 2010-CI000003. See Franklin, K. "Judge bars Static-99R risk tool from SVP trial: Developers staunchly refused requests to turn over data" (2012, Dec. 14), In the News blog.

[27] Hanson, R.K. & Phenix, A. (2013, October). Report writing for the Static-99R and Static-2002R. Preconference seminar presented at the 32nd Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, Chicago, IL, October 30, 2013. See also: "Static-99 'norms du jour' get yet another makeover," In the News blog, Nov. 17, 2013.

[28] Helmus, L.M. (2014, October). Absolute recidivism estimates for Static-99R and Static-2002R: Current research and recommendations. Paper presented at the 33rd Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, San Diego, CA, October 30, 2014.

[29]

Hanson, R. K., Thornton, D., Helmus, L-M, & Babchishin, K. (2015). What sexual recidivism rates are associated with Static-99R and Static-2002R scores? Sexual Abuse: A Journal of Research and Treatment, 1-35.

[30]

Donaldson, T., Abbott, B., & Michie, C. (2012). Problems with the Static-99R prediction estimates and confidence intervals. Open Access Journal of Forensic Psychology, 4,

1-23.

* * * * *

*Many thanks to Marcus Boccaccini, Gregory DeClue, Daniel Murrie and other knowledgeable colleagues for their valuable feedback.

This timeline can be downloaded in pdf format for easy reference.

Another online version can also be found at my website.

* * * * *

Related blog posts:

· Static-99 "norms du jour" get yet another makeover (Nov. 17, 2013)

· Static-99 developers embrace redemption: Sex offender risk plummets over time in community, new study reports (Nov. 5, 2013)

· Black swan crash lands on Florida SVP program: Audit finds low recidivism, critiques reliance on inflated Static-99 risk estimates (Oct. 27, 2013)