Thursday, September 23, 2010

Static redux: Sandgropers jumping off rickety ship

Inventors of actuarial tools to assess sex offender risk recently described their process as "somewhat like rebuilding a ship at sea, continually replacing one plank at a time when we sprang a leak."

But while they viewed this method "as a strength, not a problem," others might reasonably regard staying aboard such a ship as a sure-fire way to drown.

Among those jumping ship are the Western Australians. Courts in Australia's largest state are increasingly rebuffing the Static-99, the world's leading actuarial tool for assessing sex offender risk.

In the latest of a string of cases, Justice John McKechnie has rejected an attempt by the Director of Public Prosecutions to civilly commit a convicted sex offender based on the man's high Static score. In his ruling, the judge criticized the "uncritical acceptance" of actuarial risk tools, saying they do not take into account reductions in risk that can accrue from prison-based treatment programs.

The ruling in the case of Leslie Fred Free is part of a "national backlash" against the Static by courts as well as legal and psychological researchers, according to a news report at The report quotes Bernadette McSherry, a law professor at Monash University who has been tracking cases in which Australian courts have rejected the Static-99.

Many question whether an instrument developed in North America is relevant or reliable for forensic use Down Under. In a 2007 decision, another Western Australian judge focused on the problem with applying the Static-99 to aboriginal offenders. Noted Judge Hasluck in the case of Robert Mangolamara, a young aboriginal man:

[T]he facts and assumptions underlying the assessment tools and related manuals have not been proved…. Moreover, [because] the tools were not devised for and do not necessarily take account of the social circumstances of indigenous Australians in remote communities, I harbour grave reservations as to whether a person of the respondent's background can be easily fitted within the categories of appraisal presently allowed for by the assessment tools.….
Sandgropers, as Western Australians are known, think of themselves as especially good-looking and intelligent. But even here in the less enlightened United States the mainstream forensic community is expressing growing skepticism about the science underlying the actuarials.

Forensic psychiatry journal issues scathing critique

The American Academy of Psychiatry and the Law has just published a scathing critique of the Static-99, saying it lacks each of two elements required for an accurate statistical method of calculating risk: representative samples and uniform measures of outcome.

The must-read article, "Alice in Actuarial-Land" (freely available online), provides a detailed overview of the dizzying changes in recommended norms and procedures for using and interpreting the Static family of instruments over the past decade.

The authors, Shoba Sreenivasan, Linda Weinberger, Allen Frances, and Sarah Cusworth-Walker, illustrate the resultant problems through the hypothetical case of "Mr. X," a 62-year-old rapist.
  • Using the original Static-99 norms, Mr. X's risk of sexually reoffending within 5 years of release falls somewhere between 9.1 percent and 39 percent. Qualitatively, in other words, he falls "somewhere between a low and a high risk."
  • Using the norms provided with the revised Static-99R instrument, Mr. X's risk lies between 8.7 percent and 29.6 percent by 10 years. Again, "somewhere between low and high risk."
Although the case of Mr. X is hypothetical, the judge in the Western Australia case of Mr. Free commented on similar discrepancies between the risk assessments of two forensic psychologists who both relied upon the Static-99. Wrote Judge McKechnie in his insightful analysis:
Dr. [Tara] Yewer's conclusion that [Mr. Free's] risk of re-offending over a five year period is nearly 10 percent more than Ms. [Kirstin] Bouse's conclusion over the same time span is unexplained in the evidence. There is no reason on the evidence why I should reject Ms. Bouse's assessment of 'medium-high' risk or accept in preference Dr. Yewer's assessment of 'high' risk. This application highlights the limitations of STATIC-99.

Any judge who engages in sentencing offenders undertakes some prediction of risk as part of the sentencing process, and does so against a background of principle and experience which suggests that, for example, many young people mature and grow out of crime. Others forsake illicit substances and alcohol and the criminal lifestyle that accompanied them. For some, a period of imprisonment has an actual deterrent effect....

Uncritical acceptance of the STATIC-99 score also negates the whole purpose of [sex offender treatment]. If the programme, to which significant resources are given, has no effect on the risk that participants might then pose to the community, why does the Department of Corrective Services bother with it? … In the present case, I am unable to accept uncritically the risk assessment undertaken by coding the STATIC-99 score, because of earlier reports suggesting the possibility of effective treatment … coupled with the respondent's actual performance [in treatment].
Astonishingly, high-stakes forensic decisions such as criminal sentencing and civil commitment are being made based on such wildly disparate interpretations of data. Also, the authors of the Journal of the American Academy of Psychiatry and Law article point out, most of the data underlying the Static norms derive from master's or doctoral-level papers that were never published or subjected to the peer-review process:
Apart from the dizzying number of risk scores and qualifications, the validity of the risk scores themselves is dubious, given different definitions of recidivism in the norming samples, lack of clarity in statistical methods, and an overreliance on unpublished manuscripts and presentations to document methods….

The expression of risk in numerical form, whether it is a risk percentage, a d-statistic, or a receiver operating characteristic (ROC) or risk ratio, gives the trier of fact the impression of the precision of risk to a greater degree of accuracy than actually exists.

The serious nature of the sentencing decisions being made using these norms requires that these risk estimates are getting it right….

Although they purport to be empirically based, the current Static-99 and its newer iteration, the Static-99R, violate the basic tenets of evidence-based medicine that require reasoned, not mechanical, application of group findings to the individual.
Lead authors Sreenivasan and Weinberger are core faculty at the Institute of Psychiatry and Law of the University of Southern California. That's more than 10,000 miles from Western Australia. But perhaps they recently vacationed Down Under, and drank a bit of the Perth water.

If they sailed, I just hope they first checked the ship's safety record. Replacing planks at sea can be risky business. Especially if you hit a patch of rough weather.

Of related interest:
  • For readers interested in learning more about the Dangerous Sexual Offenders Act of Western Australia, as well as the rules of expert evidence admissibility more generally in Australia, I recommend the written decision in the 2007 case of Robert Anthony Mangolamara, available HERE.


  1. In reading the Sreenivasan and colleagues’ paper it became apparent to me that this is a bold attempt to legitimize the combination of clinical judgment with actuarial science. They support their opinions by using straw man arguments about the Static-99/99R. The same critical analysis they used with the Statics, despite misrepresenting many points, lacked in their analysis of combining actuarial science with clinical judgment. They provide no references to support this position and in fact the prevailing literature is solidly against such a method, which they neglect to point out (c.f., Grove and Meehl, 1996 among others). They essentially argued through their case example that by considering two additional risk factors above the Static-99R. namely strangulation and a diagnosis of sexual sadism, warrants boosting the respondent’s risk estimate by at least 30 percentage points, assuming the likely standard is set at 50% or greater. How preposterous! There is no information in the literature what would support increasing the actuarially derived risk assessment by 2.5 times its rate using other risk factors in general or the two that Sreenivasan et al. apply in the case example. One only has to look as far as the recent article by Kingston et al. to understand the fallacy of their logical, as well as to illustrate the dangerousness of the method they propose. Kingston et al. (2010) found that the diagnosis of sexual sadism and violence of acts (possible proxy variable for strangulation) did not add incremental validity beyond the actuarial instrument score (in this research it was the SORAG) in predicting sexual recidivism.

    With decreasing sexual recidivism base rates and the fact that actuarial instruments only account for a fraction of the variables associated with the risk of individuals contained in actuarial samples, it is becoming increasingly difficult for actuarial instruments to forecast sexual recidivism rates at the required legal threshold contemplated by SVP laws. The legitimate criticisms Sreenivasan et al. level against the Static-99/99R are insufficient to justify combining actuarial science with clinical judgment. Rather, weaknesses of the actuarial method should guide clinicians in how to interpret the risk data, as well as to communicate information that will assist the trier of fact in assessing the relevancy, admissibility and weight to assign the actuarial findings. The method propounded by Sreenivasan et al. invalidates the actuarial risk estimates, has no known reliability or validity, and lacks the transparency in methods (Janus & Prentky, 2003) that prevents the trier of fact from adequately assessing its relevancy, admissibility, and weight.

    Brian R. Abbott, Ph.D.

  2. Brian,
    Thanks for your insightful comment. Now that I have had a chance to review the authors' accompanying paper on ethics, I fear you are right: Their criticism of the Static instruments is part of a larger effort to legitimize an even more unscientific approach that will enable evaluators to inflate sex offenders' risk to numbers far higher than what the science supports. Scary!

  3. As an Australian and an academic, this article surprises me for a few reasons. Firstly, our political leaders often try to push things which have little utility (or sense!) into practice (internet filter, banning small breasts in pornography, etc.) and sadly, it surprises me that it got to the point in one state at least, that a judge stepped on this foolish tool's use. I would have expected it to become policy without much scrutiny at all.

    Secondly, the fact that this tool was largely created based on the unpublished papers of students, which weren't peer-reviewed, is very disturbing. I suspect the STATIC tools really appeal to those in power for their perceived ability to quantify recidivism down to a number, and absolve them of political liability when things go wrong ("hey, we relied on the STATIC-99 tool, it's not our fault!"). Who cares whether the data used in its development were right?

    Finally, the most troubling thing, although perhaps not surprising, is that a little industry seems to have been set up around STATIC-99, one example of which is certifying people as "STATIC-99 Trainers" (see Still less surprising is the fact that my state is listed in there, complete with the mug shot of the proud trainer. I have little faith in my state seeing reason and getting rid of it.

    I wonder how hard those with vested interests will fight to keep STATIC-99 around.

    Another great article, Karen. Thanks!

  4. Jim Manley, Ph.D.October 02, 2010

    The authors laid out a few of the difficulties of over-relying on the Static. The practice of forecasting an indidivual's risk based only on a recent table from an instrument with "moderate" predictive power is short sighted. Some sort of internal process must occur. Call it "clinial Judgement" or "interpreting risk data."

    Of course elements considered must have a scienfific foundation. Analyzing multiple data sets from scientifically derived sources is where the rubber hits the road.

  5. After reading the comments, I went and read the ethics piece I was just reading along, finding their "normative ethics" versus "consequentialist ethics" a somewhat interesting approach to things, though the description of the arguments was rather clearly biased in favor of pro-SVP laws. And then I got to the end of the section on normative ethics:
    This discussion demonstrates that there is no one conclusive normative ethic that can simply guide our understanding of the suitability of SVP/SDP statutes and their proper application to a given individual. Once it is accepted that there will not be a simple
    correct and absolute answer that settles the SVP/SDP debate, the most ethically appropriate position is that of a consequential model
    Okay, I thought, you've now argued against deciding whether or not we should have SVP laws with "normativist ethics" so now you're going to consider them in terms of "consequentialist ethics" and consider arguments for and against them from the perspective of whether they represent good public policy, whether they are a financially effective way of reducing sexual violence, etc. Instead what I get is this:
    namely, to consider the ethical application of SVP/SDP statutes to an individual on a case-by-case basis.
    They presuppose a positive answer to the question "Should we have SVP laws?" Right after they just said that the arguments they had just considered didn't answer that question one way or the other. I mean, like, wow...

  6. Andrew,

    The ethics piece is problematic on a number of levels. I hope to post a reaction soon.

  7. Anonymous from Australia,

    I think you are spot-on about the use of the Static-99 by courts, politicians, bureaucrats, etc. to avoid responsibility. I've seen that here in California, where its use is institutionalized.


    is good follow-up, online article to these sticky issues, imo.


Real Time Web Analytics