IN THE NEWS: Static-99: A bumpy developmental path

By Brian Abbott, PhD and Karen Franklin, PhD*

The Static-99 is the most widely used instrument for assessing sex offenders’ future risk to the public. Indeed, some state governments and other agencies even mandate its use. But bureaucratic faith may be misplaced. Conventional psychological tests go through a standard process of development, beginning with the generation and refinement of items and proceeding through set stages that include pilot testing and replication, leading finally to peer review and formal publication. The trajectory of the Static-99 has been more haphazard: Since its debut 15 years ago, the tool has been in a near-constant state of flux. Myriad changes in items, instructions, norms and real-world patterns of use have cast a shadow over its scientific validity. Here, we chart the unorthodox developmental course of this tremendously popular tool.

Static-99 and 99R Developmental Timeline

Date	Event
1990	The first Sexually Violent Predator (SVP) law passes in the United States, in Washington. A wave of similar laws begins to sweep the nation.
1997	The US Supreme Court upholds the Constitutionality of preventive detention of sex offenders.
1997	R. Karl Hanson, a psychologist working for the Canadian prison system, releases a four-item tool to assess sex offender risk. The Rapid Risk Assessment for Sex Offence Recidivism (RRASOR) uses data from six settings in Canada and one in California.[1]
1998	Psychologists David Thornton and Don Grubin of the UK prison system release a similar instrument, the Structured Anchored Clinical Judgment (SACJ- Min) scale.[2]
1999	Hanson and Thornton combine the RRASOR and SACJ-Min to produce the Static-99, which is accompanied by a three-page list of coding rules.[3] The instrument's original validity data derive from four groups of sex offenders, including three from Canada and one from the UK (and none from the United States). The new instrument is atheoretical, with scores interpreted based on the recidivism patterns among these 1,208 offenders, most of them released from prison in the 1970s.
2000	Hanson and Thornton publish a peer-reviewed article on the new instrument.[4]
2003	New coding rules are released for the Static-99, in an 84-page, unpublished booklet that is not peer reviewed.[5] The complex and sometimes counterintuitive rules may lead to problems with scoring consistency, although research generally shows the instrument can be scored reliably.
2003	The developers release a new instrument, the Static-2002, intended to "address some of the weaknesses of Static-99."[6] The new instrument is designed to be more logical and easier to score; one item from the Static-99 – pertaining to whether the subject had lived with a lover for at least two years – was dropped due to issues with its reliability and validity. Despite its advantages, Static-2002 never caught on, and did not achieve the popularity of the Static-99 in forensic settings.
2007	Leslie Helmus, A graduate student working with Karl Hanson, reports that contemporary samples of sex offenders have much lower offense rates than did the antiquated, non-US samples upon which the Static-99 was originally developed, both in terms of base rates of offending and rates of recidivism after release from custody.[7]
September 2008	Helmus releases a revised actuarial table for Static-99, to which evaluators may compare the total scores of their subjects to corresponding estimates of risk.[8] Another Static-99 developer, Amy Phenix, releases the first of several "Evaluators’ Handbooks."[9]
October 2008	At an annual convention of the Association for the Treatment of Sexual Abusers (ATSA), Andrew Harris, a Canadian colleague of Hanson's, releases a new version of the Static-99 with three separate "reference groups" (Complete, CSC and High Risk) to which subjects can be compared. Evaluators are instructed to report a range of risks for recidivism, with the lower bound coming from a set of Canadian prison cases (the so-called CSC, or Correctional Service of Canada group), and the upper bound derived from a so-called "high-risk" group of offenders. The risk of the third, or "Complete," group was hypothesized as falling somewhere between those of the other two groups.[10]
November 2008	At a workshop sponsored by a civil commitment center in Minnesota, Thornton and a government evaluator named Dennis Doren propose yet another new method of selecting among the new reference groups. In a procedure called "cohort matching,” they suggest comparing an offender with either the CSC or High-Risk reference group based on how well the subject matched a list of external characteristics they had created but never empirically tested or validated.[11]
December 2008	Phenix and California psychologist Dale Arnold put forth yet a new idea for improving the accuracy of the Static-99: After reporting the range of risk based on a combination of the CSC and High-Risk reference groups, evaluators are encouraged to consider a set of external factors, such as whether the offender had dropped out of treatment and the offender's score on Robert Hare's controversial Psychopathy Checklist-Revised (PCL-R). This new method does not seem to catch on.[12] [13]
2009	An official Static-99 website, www.static99.org, debuts.[14]
Winter 2009	The Static-99 developers admit that norms they developed in 2000 are not being replicated: The same score on the Static-99 equates with wide variations in recidivism rates depending on the sample to which it is compared. They theorize that the problem is due to large reductions in Canadian and U.S. recidivism rates since the 1970s-1980s. They call for the development of new norms.[15]
September 2009	Hanson and colleagues roll out a new version of the Static-99, the Static-99R.[16] The new instrument addresses a major criticism by more precisely considering an offender's age at release, an essential factor in reoffense risk. The old Static-99 norms are deemed obsolete. They are replaced by data from 23 samples collected by Helmus for her unpublished Master's thesis. The samples vary widely in regard to risk. For estimating risk, the developers now recommend use of the cohort matching procedure to select among four new reference group options. They also introduce the concepts of percentile ranks and relative risk ratios, along with a new Evaluators’ Workbook for Static-99R and Static-2002R. Instructions for selecting reference groups other than routine corrections are confusing and speculative. Research is lacking to demonstrate that selecting other than routine corrections reference group produces more accurate risk estimates.[17]
November 2009	Just two months after their introduction, the Evaluators’ Workbook for Static-99R and Static-2002R is withdrawn due to errors in its actuarial tables.[18] The replacement workbook provides the same confusing and speculative method for selecting a nonroutine reference group, a method that lacks scientific validation and reliability.
2010	An international team of researchers presents large-scale data from the United States, New Zealand and Australia indicating that the Static-99 would be more accurate if it took better account of an offender's age.[19] The Static-99 developers do not immediately embrace these researchers' suggestions.
January 2012	Amy Phenix and colleagues introduce a revised Evaluators’ Workbook for Static-99R and Static-2002R.[20] The new manual makes a number of revisions both to the underlying data (including percentile rank and relative risk ratio data) and to the recommended procedure for selecting a reference group. Now, in an increasingly complex procedure, offenders are to be compared to one of three reference groups, based on how many external risk factors they had. The groups included Routine Corrections (low risk), Preselected Treatment Need (moderate risk), and Preselected High Risk Need (high risk). Subsequent research shows that using density of external risk factors to select among the three reference group options is not valid and has no proven reliability.[21]A fourth reference group, Nonroutine Corrections, may be selected using a separate cohort-matching procedure. New research indicates that evaluators who are retained most often by the prosecution are more likely than others to select the high-risk reference group, [22] which has base rates much higher than in contemporary sexual recidivism studies and will thus produce exaggerated risk estimates.[23]
July 2012	Six months later, the percentile ranks and relative risk ratios are once again modified, with the issuance of the third edition of the Static-99R and Static-2002R Evaluators’ Handbook.[24] No additional data is provided to justify that the selection of nonroutine reference groups produces more accurate risk estimates than choosing the routine corrections reference group.
October 2012	In an article published in Criminal Justice & Behavior, the developers concede that risk estimates for the 23 offender samples undergirding the Static-99 vary widely. Further, absolute risk levels for typical sex offenders are far lower than previously reported, with the typical sex offender having about a 7% chance of committing a new sex offense within five years. They theorize that the Static-99 might be inflating risk of reoffense due to the fact that the offenders in its underlying samples tended to be higher risk than average.[25]
2012	The repeated refusal of the Static-99 developers to share their underlying data with other researchers, so that its accuracy can be verified, leads to a court order excluding use of the instrument in a Wisconsin case.[26]
October 2013	At an annual ATSA convention, Hanson and Phenix report that an entirely new reference group selection system will be released in a peer-reviewed article in Spring 2014.[27] The new system will include only two reference groups: Routine Corrections and Preselected High Risk High Need. An atypical sample of offenders from a state hospital in Bridgewater, Massachusetts dating back to 1958 is to be removed altogether, along with some other samples, while some new data sets are to be added.
October 2014	At the annual ATSA convention, the developers once again announce that the anticipated rollout of the new system has been pushed back pending acceptance of the manuscript for publication. Helmus nonetheless presents an overview.[28] She reports that the new system will abandon two out of the current four reference groups, retaining only Routine Corrections and Preselected High Risk Need. Evaluators should now use the Routine Corrections norms as the default unless local norms (with a minimum of 100 recidivists) are available. Evaluators will be permitted to choose the Preselected High Risk Need norms based on “strong, case-specific justification.” No specific guidance nor empirical evidence to support such a procedure is proffered. A number of other new options for reporting risk information are also presented, including the idea of combining Static-99 data with that from newly developed, so-called "dynamic risk instruments."
January 2015	At an ATSA convention presentation followed by an article in the journal Sexual Abuse,[29] the developers announce further changes in their data sets and how Static-99R scores should be interpreted. Only two of the original four "reference groups" are still standing. Of these, the Routine group has grown by 80% (to 4,325 subjects), while the High-Risk group has shrunk by 35%, to a paltry 860 individuals. Absent from the article is any actuarial table on the High-Risk group, meaning the controversial practice by some government evaluators of inflating risk estimates by comparing sex offenders' Static-99R scores with the High-Risk group data has still not passed any formal peer review process. The developers also correct a previous statistical method as recommended by Ted Donaldson and colleagues back in 2012,[30] the effect of which is to further lower risk estimates in the high-risk group. Only sex offenders in the Routine group with Static-99R scores of 10 are now statistically more likely than not to reoffend. It is unknown how many sex offenders were civilly committed in part due to reliance on the now-obsolete data.

References

[1] Hanson, R. K. (1997). The development of a brief actuarial risk scale for sexual offense recidivism. (Unpublished report 97-04). Ottawa: Department of the Solicitor General of Canada.

[2] Grubin, D. (1998). Sex offending against children: Understanding the risk. Unpublished report, Police Research Series Paper 99. London: Home Office.

[3] Hanson, R.K. & Thornton, D. (1999). Static 99: Improving Actuarial Risk Assessments for Sex Offenders. Unpublished paper

[4] Hanson, R. K., & Thornton, D. (2000). Improving risk assessments for sex offenders: A comparison of three actuarial scales. Law and Human Behavior, 24(1), 119-136.

[5] Harris, A. J. R., Phenix, A., Hanson, R. K., & Thornton, D. (2003). Static-99 coding rules: Revised 2003. Ottawa, ON: Solicitor General Canada.

[6] Hanson, R.K., Helmus, L., & Thornton, D (2010). Predicting recidivism amongst sexual offenders: A multi-site study of Static-2002. Law & Human Behavior 34, 198-211.

[7] Helmus, L. (2007). A multi-site comparison of the validity and utility of the Static-99 and Static-2002 for risk assessment with sexual offenders. Unpublished Honour’s thesis, Carleton University, Ottawa, ON, Canada.

[8] Helmus, L. (2008, September). Static-99 Recidivism Percentages by Risk Level. Last Updated September 25, 2008. Unpublished paper.

[9] Phenix, A., Helmus, L., & Hanson, R.K. (2008, September). Evaluators’ Workbook. Unpublished, September 28, 2008

[10] Harris, A. J. R., Hanson, K., & Helmus, L. (2008). Are new norms needed for Static-99? Workshop presented at the ATSA 27th Annual Research and Treatment Conference on October 23, 2008, Atlanta: GA. Available at www.static99.org.

[11] Doren, D., & Thornton, D. (2008). New Norms for Static-99: A Briefing. A workshop sponsored by Sand Ridge Secure Treatment Center on November 10, 2008. Madison, WI.

[12] Phenix, A. & Arnold, D. (2008, December). Proposed Considerations for Conducting Sex Offender Risk Assessment Draft 12-14-08. Unpublished paper.

[13] Abbott, B. (2009). Applicability of the new Static-99 experience tables in sexually violent predator risk assessments. Sexual Offender Treatment, 1, 1-24.

[14] Helmus, L., Hanson, R. K., & Thornton, D. (2009). Reporting Static-99 in light of new research on recidivism norms. The Forum, 21(1), Winter 2009, 38-45.

[15] Ibid.

[16] Hanson, R. K., Phenix, A., & Helmus, L. (2009, September). Static-99(R) and Static-2002(R): How to Interpret and Report in Light of Recent Research. Paper presented at the 28th Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, Dallas, TX, September 28, 2009.

[17] DeClue, G. & Zavodny, D. (2014). Forensic use of the Static-99R: Part 4. Risk Communication. Journal of Threat Assessment and Management, 1(3), 145-161.

[18] Phenix, A., Helmus, L., & Hanson, R.K. (2009, November). Evaluators’ Workbook. Unpublished, November 3, 2009.

[19] Wollert, R., Cramer, E., Waggoner, J., Skelton, A., & Vess, J. (2010). Recent Research (N = 9,305) Underscores the Importance of Using Age-Stratified Actuarial Tables in Sex Offender Risk Assessments. Sexual Abuse: A Journal of Research and Treatment, 22 (4), 471-490. See also: "Age tables improve sex offender risk estimates," In the News blog, Dec. 1, 2010.

[20] Phenix, A., Helmus, L., & Hanson, R.K. (2012, January). Evaluators’ Workbook. Unpublished, January 9, 2012.

[21] Abbott, B.R. (2013). The Utility of Assessing “External Risk Factors” When Selecting Static-99R Reference Groups. Open Access Journal of Forensic Psychology, 5, 89-118.

[22] Chevalier, C., Boccaccini, M. T., Murrie, D. C. & Varela, J. G. (2014), Static-99R Reporting Practices in Sexually Violent Predator Cases: Does Norm Selection Reflect Adversarial Allegiance? Law & Human Behavior. To request a copy from the author, click HERE.

[23] Abbott (2013) op. cit.

[24] Phenix, A., Helmus, L., & Hanson, R.K. (2012, July). Evaluators’ Workbook. Unpublished, July 26, 2012.

[25] Helmus, Hanson, Thornton, Babchishin, & Harris (2012), Absolute recidivism rates predicted by Static-99R and Static-2002R sex offender risk assessment tools vary across samples: A meta-analysis, Criminal Justice & Behavior. See also: "Static-99R risk estimates wildly unstable, developers admit," In the News blog, Oct. 18, 2012.

[26] State of Wisconsin v. Homer L. Perren Jr., La Crosse County 2010-CI000003. See Franklin, K. "Judge bars Static-99R risk tool from SVP trial: Developers staunchly refused requests to turn over data" (2012, Dec. 14), In the News blog.

[27] Hanson, R.K. & Phenix, A. (2013, October). Report writing for the Static-99R and Static-2002R. Preconference seminar presented at the 32nd Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, Chicago, IL, October 30, 2013. See also: "Static-99 'norms du jour' get yet another makeover," In the News blog, Nov. 17, 2013.

[28] Helmus, L.M. (2014, October). Absolute recidivism estimates for Static-99R and Static-2002R: Current research and recommendations. Paper presented at the 33rd Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, San Diego, CA, October 30, 2014.

[29]

Hanson, R. K., Thornton, D., Helmus, L-M, & Babchishin, K. (2015). What sexual recidivism rates are associated with Static-99R and Static-2002R scores? Sexual Abuse: A Journal of Research and Treatment, 1-35.

[30]

Donaldson, T., Abbott, B., & Michie, C. (2012). Problems with the Static-99R prediction estimates and confidence intervals. Open Access Journal of Forensic Psychology, 4,

1-23.

* * * * *

*Many thanks to Marcus Boccaccini, Gregory DeClue, Daniel Murrie and other knowledgeable colleagues for their valuable feedback.

This timeline can be downloaded in pdf format for easy reference.

Another online version can also be found at my website.

* * * * *

Related blog posts:

· Static-99 "norms du jour" get yet another makeover (Nov. 17, 2013)

· Static-99 developers embrace redemption: Sex offender risk plummets over time in community, new study reports (Nov. 5, 2013)

· Black swan crash lands on Florida SVP program: Audit finds low recidivism, critiques reliance on inflated Static-99 risk estimates (Oct. 27, 2013)