Showing posts sorted by relevance for query PCL-R. Sort by date Show all posts
Showing posts sorted by relevance for query PCL-R. Sort by date Show all posts

May 29, 2012

SVP risk tools show 'disappointing' reliability in real-world use

Rater agreement on three instruments commonly used  to assess sex offenders' risk of recidivism is much lower in practice than reported in the tools' manuals, according to a new study out of Florida.

Faring most poorly was the Psychopathy Checklist (PCL-R). Correlations of scores between two evaluators hired by the same agency were in the low range. On average, psychologists differed by five points on the instrument, which has a score range of of zero to 40. In one case, two evaluators were apart by a whopping 24 points!

Agreement among evaluators was only moderate on the Static-99 and the MnSOST-R, two actuarial risk assessment instruments for which scoring is relatively more straightforward.

The study, published in the respected journal Psychological Assessment, was a collaboration between scholars from the Department of Mental Health Law and Policy at the University of South Florida and researchers with the Florida Department of Children and Families. It utilized archived records culled from the almost 35,000 individuals screened for possible Sexually Violent Predators (SVP) civil commitment in Florida between 1999 and 2009. The researchers located 315 cases in which the same individual was evaluated by separate clinicians who each administered both the PCL-R and at least one of the two actuarial measures within a short enough time frame to enable direct scoring comparisons.

It would be a mistake to lean too heavily on the results of a single isolated study. But the present study adds to a burgeoning body from several groups of independent researchers, all pointing to troubling problems with the accuracy of instruments designed to forecast risk of recidivism among sex offenders.

Related study: Psychopathy and sexual deviance not predictive

Collectively, the research has been especially critical of the ability of the highly prejudicial construct of psychopathy to add meaningfully to risk prediction in this high-stakes arena. Indeed, just this week another study has come out indicating that neither psychopathy scores nor sexual deviance measures improve on the accuracy provided by an actuarial instrument alone.

An especially interesting finding of that Canadian study is that reoffense rates were still below 12 percent over a 6-year followup period for even the most high-risk offenders -- those with high risk ratings on the Static-99R plus high levels of psychopathy and sexual deviance (as measured by phallometric testing). This makes it inappropriate to inflate risk estimates over and above those derived from Static-99R scores alone, the authors caution.

Item-level analysis finds varying rates of accuracy

A unique contribution of the Florida study is its analysis of the relative accuracy of every single item in each of the three instruments studied. Handy tables allow a forensic practitioner to see which items have the poorest reliability, meaning they should be viewed skeptically by forensic decision-makers.

For example, take the MnSOST-R, a now-defunct instrument with a score range of –14 to 31 points. The total gap between evaluators was as wide as 19 points; the items with the greatest variability in scoring were those pertaining to offenders' functioning during incarceration, such as participation in treatment.

Meanwhile, the weak performance of the Psychopathy Checklist owes much to the items on its so-called “Factor 1,” which attempt to measure the personality style of the psychopath. As I've discussed before, rating someone as “glib,” “callous” or “shallow” is a highly subjective enterprise that opens the door to a veritable avalanche of personal bias.

Piggy-backing off a recommendation by John Edens and colleagues, the Florida team suggests that the prejudicial deployment of the Psychopathy Checklist may be superfluous, in that scores on Factor 2 alone (the items reflecting a chronic criminal lifestyle) are more predictive of future violence or sexual recidivism.

Next up, we need to identify the causes of the poor interrater reliability for forensic risk prediction instruments in real-world settings. Is it due to inadequate training, differing clinical skills, variable access to collateral data, intentional or unintentional bias on the part of examiners, adversarial allegiance effects (not a factor in the present study, since both evaluators were appointed by the same agency), or some combination?

In the meantime, the fact that two evaluators working on the same side cannot reliably arrive at the same risk rating for any particular individual should certainly raise our skepticism about the validity of risk prediction based on these instruments.

The studies are:

Reliability of Risk Assessment Measures Used in Sexually Violent Predator Proceedings. Cailey Miller, Eva Kimonis, Randy Otto, Suzonne Kline and Adam Wasserman. Psychological Assessment. Published online 7 May 2012. Advance online publication. Click HERE to contact the authors.

Does Consideration of Psychopathy and Sexual Deviance Add to the Predictive Validity of the Static-99R? Jan Looman, Nicola A. C. Morphett and Jeff Abracen. International Journal of Offender Therapy and Comparative Criminology. Published online 28 May 2012. Click HERE to contact the authors.


Related blog posts: 

September 3, 2015

Adversarial allegiance: Frontier of forensic psychology research

A colleague recently commented on how favorably impressed he was about the open-mindedness of two other forensic examiners, who had had the courage to change their opinions in the face of new evidence. The two had initially recommended that a man be civilly committed as a sexually violent predator, but changed their minds three years later .

My colleague's admiration was short-lived. It evaporated when he realized that the experts’ change of heart had come only after they switched teams: Initially retained by the government, they were now in the employ of the defense.

"Adversarial allegiance" is the name of this well-known phenomenon in which some experts' opinions tend to drift toward the party retaining their services. This bias is insidious because it operates largely outside of conscious awareness, and can affect even ostensibly objective procedures such as the scoring and interpretation of standardized psychological tests.

Partisan bias is nothing new to legal observers, but formal research on its workings is in its infancy. Now, the researchers spearheading the exploration of this intriguing topic have put together a summary review of the empirical evidence they have developed over the course of the past decade. The review, by Daniel Murrie of the Institute of Law, Psychiatry and Public Policy at the University of Virginia and Marcus Boccaccini of Sam Houston State University, is forthcoming in the Annual Review of Law and Social Science.

Forensic psychologists’ growing reliance on structured assessment instruments gave Murrie and Boccaccini a way to systematically explore partisan bias. Because many forensic assessment tools boast excellent interrater reliability in the laboratory, the team could quantify the degradation of fidelity that occurs in real-world settings. And when scoring trends correlate systematically with which side the evaluator is testifying for, adversarial allegiance is a plausible culprit.

Daniel Murrie
Such bias has been especially pronounced with the Psychopathy Checklist-Revised, which is increasingly deployed as a weapon by prosecutors in cases involving future risk, such as capital murder sentencing hearings, juvenile transfer to adult courts, and sexually violent predator commitment trials. In a series of ground-breaking experiments, the Murrie-Boccaccini team found that scores on the PCL-R vary hugely and systematically based on whether an expert is retained by the prosecution or the defense, with the differences often exceeding what is statistically plausible based on chance.

Systematic bias was also found in the scoring of two measures designed to predict future sexual offending, the popular Static-99 and the now-defunct Minnesota Sex Offender Screening Tool Revised (MnSOST-R).

One shortcoming of the team’s initial observational research was that it couldn’t eliminate the possibility that savvy attorneys preselected who were predisposed toward one side or the other. To test this possibility, two years ago the team designed a devious experimental study in which they recruited forensic psychologists and psychiatrists and randomly assigned them to either a prosecution or defense legal unit. To increase validity, the experts were even paid $400 a day for their services.

Marcus Boccaccini
The findings provided proof-positive of the strength of the adversarial allegiance effect. Forensic experts assigned to the bogus prosecution unit gave higher scores on both the PCL-R and the Static-99R than did those assigned to the defense. The pattern was especially pronounced on the PCL-R, due to the subjectivity of many of its items. ("Glibness" and "superficiality," for example, cannot be objectively measured.)

The research brought further bad tidings. Even when experts assign the same score on the relatively simple Static-99R instrument, they often present these scores in such a way as to exaggerate or downplay risk, depending on which side they are on. Specifically, prosecution-retained experts are far more likely to endorse use of "high-risk" norms that significantly elevate risk.

Several somewhat complimentary theories have been advanced to explain why adversarial allegiance occurs. Prominent forensic psychologist Stanley Brodsky has attributed it to the social psychological process of in-group allegiance. Forensic psychologists Tess Neal and Tom Grisso have favored a more cognitive explanation, positing heuristic biases such as the human tendency to favor confirmatory over disconfirmatory information. More cynically, others have attributed partisan bias to conscious machinations in the service of earning more money. Murrie and Boccaccini remain agnostic, saying that all of these factors could play a role, depending upon the evaluator and the situation.   

One glimmer of hope is that the allegiance effect is not universal. The research team found that only some of the forensic experts they studied are swayed by which side retains them. Hopefully, the burgeoning interest in adversarial allegiance will lead to future research exploring not only the individual and situational factors that trigger bias, but also what keeps some experts from shading their opinions toward the retaining party.

Even better would be if the courts took an active interest in this problem of bias. Some Australian courts, for example, have introduced a method called "hot tubs" in which experts for all of the sides must come together and hash out their differences outside of court. 

In the meantime, watch out if someone tries to recruit you at $400 a day to come and work for a newly formed legal unit. It might be another ruse, designed to see how you hold up to adversarial pressure.

* * * * *

The article is: Adversarial Allegiance among Expert Witnesses, forthcoming from The Annual Review of Law and Social Science. To request it from the first author, click HERE


Related blog posts:

May 30, 2010

Psychopathy guru blocks critical article

Will case affect credibility of PCL-R test in court?

Despite recent evidence that scores on the Psychopathy Checklist-Revised (PCL-R) vary widely in adversarial legal contexts depending on which party retained the evaluator, the test has become increasingly popular in forensic work. In Texas, indeed, Sexually Violent Predator (SVP) evaluators are required by statute to measure psychopathy; almost all use this test. It is not surprising that prosecutors find the PCL-R particularly attractive: Evidence of high psychopathy has a powerfully prejudicial impact on jurors deciding whether a capital case defendant or a convicted sex offender is at high risk for bad conduct in the future.

But a current effort by the instrument's author, Robert Hare, to suppress publication of a critical article in a leading scientific journal may paradoxically reduce the credibility of the construct of psychopathy in forensic contexts.

That's the opinion of two psychology-law leaders, psychologist Norman Poythress and attorney John Petrila of the University of South Florida (two authors of a leading forensic psychology text, Psychological Evaluations for the Courts), in a critical analysis of Dr. Hare's threat to sue the journal Psychological Assessment. The contested article, "Is Criminal Behavior a Central Component of Psychopathy? Conceptual Directions for Resolving the Debate," is authored by prominent scholars Jennifer Skeem of UC Irvine and David Cooke of Glasgow University. The study remains unpublished.

"[T]he threat of litigation constitutes a serious threat to academic freedom and potentially to scientific progress," write Poythress and Petrila in the current issue of the International Journal of Forensic Mental Health. "Academic freedom rests on the premise that advances in science can only occur if scholars are permitted to pursue free competition among ideas. This assumes that scholars have the liberty to do their work free from limitations imposed by political or religious pressure or by economic reprisals."

According to Poythress and Petrila, after the critical article passed the peer-review process and was accepted for publication, Dr. Hare's lawyer sent a letter to the authors and the journal stating that Dr. Hare and his company would "have no choice but to seek financial damages from your publication and from the authors of the article, as well as a public retraction of the article" if it was published. The letter claimed that Skeem and Cooke's paper was "fraught with misrepresentations and other problems and a completely inaccurate summary of what amounts to [Hare's] life's work" and "deliberately fabricated or altered quotes of Dr. Hare, and substantially altered the sense of what Dr. Hare said in his previous publications."

In general, defamation claims must prove that a defendant made a false and defamatory statement that harmed the plaintiff's reputation. Truth is an absolute defense. Critical opinions are also protected from defamation actions, as are "fair comments" on matters of public interest.

In this case, the contents of Skeem and Cooke's contested article have not been made public. However, it is hard to see how critical analysis of a construct that is enjoying such unprecedented popularity and real-world impact would NOT be of public interest.

Poythress and Petrila express concern that defamation claims against opposing researchers, while traditionally rare, may be becoming more common, leading to a potentially chilling effect on both individual researchers and the broader scientific community. Like so-called SLAPPS -- Strategic Lawsuits Against Public Participation -- used by corporations and other special interest groups to impede public participation, even meritless defamation lawsuits extract heavy penalties in terms of lost time and money and emotional distress.

Judges have been critical of pretextual deployment of defamation lawsuits, Poythress and Petrila report; a judge in one case warned that "plaintiffs cannot, simply by filing suit and crying 'character assassination!,' silence those who hold divergent views, no matter how adverse those views may be to plaintiffs' interests. Scientific controversies must be settled by the methods of science rather than by the methods of litigation."

Potential negative effects of defamation threats against scientific researchers include:
  1. Researchers avoid conducting critical research out of fear of lawsuits.
  2. Academics decline to serve as volunteer peer reviewers for academic journals due to loss of anonymity in defamation suits.
  3. Journal editors self-censor on controversial topics.
As Poythress and Petrila conclude:

Because publication of the article by Professors Skeem and Cooke has effectively been long delayed, if not ultimately suppressed, one clear impact of this threat to sue is that researchers who may have been willing to investigate alternative models of psychopathy that might have been derived from the Skeem and Cooke article are not able to do so, simply because the article is unavailable. Because science progresses, in part, both by confirming viable models and disconfirming nonviable ones, the suppression of information relevant to constructing candidate models for empirical evaluation can be viewed as impeding the progress of science….

[I]t seems clear from our review that such threats strike at the heart of the peer review process, may have a chilling effect on the values at the core of academic freedom, and may potentially impede the scientific testing of various theories, models and products.
In our view it is far better to debate such matters in peer review journals rather than cut off debate through threats of litigation.
In court, meanwhile, the effects of Dr. Hare's threat may prove paradoxical. Attorneys whose clients could be prejudiced by introduction of the Psychopathy Checklist may be able to discredit the instrument by pointing to the suppression of critical literature about the underlying construct of psychopathy.

POSTSCRIPT: Just hours after I posted this, alert readers advised me that: (1) Dr. Skeem discusses the as-yet-unpublished article in her 2009 book, Psychological Science in the Courtroom: Consensus and Controversy, co-authored by Kevin Douglas and Scott O. Lilienfeld (page 179 in the Google book view is HERE), and (2) according to Dr. Hare's website, he has a response in press (which, ironically, cites the Skeem and Cooke article as being published last year).

The full article is: "PCL-R Psychopathy: Threats to Sue, Peer Review, and Potential Implications for Science and Law. A Commentary," by Norman Poythress and John P. Petrila, in the current issue of the International Journal of Forensic Mental Health. The abstract if available HERE; the full article requires a subscription.

Dr. Hare's response is: "The role of antisociality in the psychopathy construct: Comment on Skeem & Cooke (2009)."
Hare, R. D., & Neumann, C. S. (in press). Psychological Assessment.

Of related interest:

  • "The Dark Side of Peer Review," by Stephen D. Hart, also in the current issue of the International Journal of Forensic Mental Health (abstract HERE)

  • "Does interrater (dis)agreement on Psychopathy Checklist scores in Sexually Violent Predator trials suggest partisan allegiance in forensic evaluations?" by Murrie, D.C., Boccaccini, M.T., Johnson, J.T., & Janke, C. (2008). Law & Human Behavior, 32, 352-362 (abstract HERE)

September 4, 2013

'Authorship bias' plays role in research on risk assessment tools, study finds

Reported predictive validity higher in studies by an instrument's designers than by independent researchers

The use of actuarial risk assessment instruments to predict violence is becoming more and more central to forensic psychology practice. And clinicians and courts rely on published data to establish that the tools live up to their claims of accurately separating high-risk from low-risk offenders.

But as it turns out, the predictive validity of risk assessment instruments such as the Static-99 and the VRAG depends in part on the researcher's connection to the instrument in question.

Publication bias in pharmaceutical research
has been well documented

Published studies authored by tool designers reported predictive validity findings around two times higher than investigations by independent researchers, according to a systematic meta-analysis that included 30,165 participants in 104 samples from 83 independent studies.

Conflicts of interest shrouded

Compounding the problem, in not a single case did instrument designers openly report this potential conflict of interest, even when a journal's policies mandated such disclosure.

As the study authors point out, an instrument’s designers have a vested interest in their procedure working well. Financial profits from manuals, coding sheets and training sessions depend in part on the perceived accuracy of a risk assessment tool. Indirectly, developers of successful instruments can be hired as expert witnesses, attract research funding, and achieve professional recognition and career advancement.

These potential rewards may make tool designers more reluctant to publish studies in which their instrument performs poorly. This "file drawer problem," well established in other scientific fields, has led to a call for researchers to publicly register intended studies in advance, before their outcomes are known.

The researchers found no evidence that the authorship effect was due to higher methodological rigor in studies carried out by instrument designers, such as better inter-rater reliability or more standardized training of instrument raters.

"The credibility of future research findings may be questioned in the absence of measures to tackle these issues," the authors warn. "To promote transparency in future research, tool authors and translators should routinely report their potential conflict of interest when publishing research investigating the predictive validity of their tool."

The meta-analysis examined all published and unpublished research on the nine most commonly used risk assessment tools over a 45-year period:
  • Historical, Clinical, Risk Management-20 (HCR-20)
  • Level of Service Inventory-Revised (LSI-R)
  • Psychopathy Checklist-Revised (PCL-R)
  • Spousal Assault Risk Assessment (SARA)
  • Structured Assessment of Violence Risk in Youth (SAVRY)
  • Sex Offender Risk Appraisal Guide (SORAG)
  • Static-99
  • Sexual Violence Risk-20 (SVR-20)
  • Violence Risk Appraisal Guide (VRAG)

Although the researchers were not able to break down so-called "authorship bias" by instrument, the effect appeared more pronounced with actuarial instruments than with instruments that used structured professional judgment, such as the HCR-20. The majority of the samples in the study involved actuarial instruments. The three most common instruments studied were the Static-99 and VRAG, both actuarials, and the PCL-R, a structured professional judgment measure of psychopathy that has been criticized criticized for its vulnerability to partisan allegiance and other subjective examiner effects.

This is the latest important contribution by the hard-working team of Jay Singh of Molde University College in Norway and the Department of Justice in Switzerland, (the late) Martin Grann of the Centre for Violence Prevention at the Karolinska Institute, Stockholm, Sweden and Seena Fazel of Oxford University.

A goal was to settle once and for all a dispute over whether the authorship bias effect is real. The effect was first reported in 2008 by the team of Blair, Marcus and Boccaccini, in regard to the Static-99, VRAG and SORAG instruments. Two years later, the co-authors of two of those instruments, the VRAG and SORAG, fired back a rebuttal, disputing the allegiance effect finding. However, Singh and colleagues say the statistic they used, the receiver operating characteristic curve (AUC), may not have been up to the task, and they "provided no statistical tests to support their conclusions."

Prominent researcher Martin Grann dead at 44

Sadly, this will be the last contribution to the violence risk field by team member Martin Grann, who has just passed away at the young age of 44. His death is a tragedy for the field. Writing in the legal publication Das Juridik, editor Stefan Wahlberg noted Grann's "brilliant intellect" and "genuine humanism and curiosity":
Martin Grann came in the last decade to be one of the most influential voices in both academic circles and in the public debate on matters of forensic psychiatry, risk and hazard assessments of criminals and ... treatment within the prison system. His very broad knowledge in these areas ranged from the law on one hand to clinical therapies at the individual level on the other -- and everything in between. This week, he would also debut as a novelist with the book "The Nightingale."

The article, Authorship Bias in Violence Risk Assessment? A Systematic Review and Meta-Analysis, is freely available online via PloS ONE (HERE).

Related blog reports:

May 25, 2011

NPR series on psychopathy in court

This Thursday and Friday, NPR is airing a 2-part series about the controversial use of psychopathy in court.

In Part I on All Things Considered, psychology and science reporter Alix Spiegel shows the profound negative consequences of psychopathy on the lives of those so designated. She profiles a California prisoner whose friends and family do not believe he is a psychopath, but who will likely never win parole due to that label.

Part II (on Friday) examines the history of the Psychopathy Checklist (PCL-R) and explores how it became so entrenched in the criminal justice system. Dan Murrie’s research questioning the reliability of the PCL in adversarial contexts is highlighted. Robert Hare also expresses concerns about the way his tool is being used in court, and the corrupting influences of the adversarial system.

The show will be complemented by an online debate, “Does the PCL-R test have value in the criminal justice system?” I am one of the featured experts on the panel, so I hope some of you will visit the website; I will provide a link just as soon as it goes live.

For those of you who would rather tune in once, the show is also airing as a single piece on Friday on the radio program This American Life, and you will be able to download the podcast there as well.

Part I of the 2-part series is HERE; the online debate is HERE.

August 2, 2012

Violence risk instruments overpredicting danger

Tools better at screening for low risk than pinpointing high risk 


The team of Seena Fazel and Jay Singh are at it again, bringing us yet another gigantic review of studies on the accuracy of the most widely used instruments for assessing risk of violence and sexual recidivism.


This time, the prolific researchers -- joined by UK statistician Helen Doll and Swedish professor Martin Grann -- report on a total of 73 research samples comprising 24,847 people from 13 countries. Cumulatively, the samples had a high base rate of reoffense, with almost one in four reoffending over an average of about four years.

Bottom line: Risk assessment instruments are fairly good at identifying low risk individuals, but their high rates of false positives -- people falsely flagged as recidivists -- make them inappropriate “as sole determinants of detention, sentencing, and release.”

In all, about four out of ten of those individuals judged to be at moderate to high risk of future violence went on to violently offend. Prediction of sexual reoffense was even poorer, with less than one out of four of those judged to be at moderate to high risk going on to sexually offend. In samples with lower base rates, the researchers pointed out, predictive accuracy will be even poorer.

What that means, in practical terms, is that to stop one person who will go on to become violent again in the future, society must lock up at minimum one person who will NOT; for sex offenders, at least three non-recidivists must be detained for every recidivist. This, of course, is problematic from a human rights standpoint. 

Another key finding that goes against conventional wisdom was that actuarial instruments that focus on historical risk factors perform no better than tools based on clinical judgment, a finding contrary to some previous review.

The researchers included the nine most commonly used risk assessment tools, out of the many dozens that have now been developed around the world:
  • Level of Service Inventory-Revised (LSI-R) 
  • Psychopathy Checklist-Revised (PCL-R) 
  • Sex Offender Risk Appraisal Guide (SORAG) 
  • Static-99 
  • Violence Risk Appraisal Guide (VRAG) 
  • Historical, Clinical, Risk management-20 (HCR-20) 
  • Sexual Violence Risk-20 (SVR-20) 
  • Spousal Assault Risk Assessment (SARA) 
  • Structured Assessment of Violence Risk in Youth (SAVRY) 
Team leader Fazel, of Oxford University, and colleagues stressed several key implications of their findings:
One implication of these findings is that, even after 30 years of development, the view that violence, sexual, or criminal risk can be predicted in most cases is not evidence based. This message is important for the general public, media, and some administrations who may have unrealistic expectations of risk prediction for clinicians. 

A second and related implication is that these tools are not sufficient on their own for the purposes of risk assessment. In some criminal justice systems, expert testimony commonly uses scores from these instruments in a simplistic way to estimate an individual’s risk of serious repeat offending. However, our review suggests that risk assessment tools in their current form can only be used to roughly classify individuals at the group level, and not to safely determine criminal prognosis in an individual case. 

Finally, our review suggests that these instruments should be used differently. Since they had higher negative predictive values, one potential approach would be to use them to screen out low risk individuals. Researchers and policy makers could use the number safely discharged to determine the potential screening use of any particular tool, although its use could be limited for clinicians depending on the immediate and service consequences of false positives. 

A further caveat is that specificities were not high -- therefore, although the decision maker can be confident that a person is truly low risk if screened out, when someone fails to be screened out as low risk, doctors cannot be certain that this person is not low risk. In other words, many individuals assessed as being at moderate or high risk could be, in fact, low risk. 

My blog post on these researchers' previous meta-analytic study, Violence risk meta-meta: Instrument choice does matter, is HERE.

February 5, 2013

Texas SVP jurors ignoring actuarial risk scores

Expert witness for defense makes a (small) difference, study finds

The fiery debates surrounding the validity of actuarial tools to predict violence risk begs the question: How much influence do these instruments really have on legal decision-makers? The answer, at least when it comes to jurors in Sexually Violent Predator trials in Texas:

Not much.

"Despite great academic emphasis on risk measures - and ongoing debates about the value, accuracy, and utility of risk-measure scores reported in SVP hearings - our findings suggest these risk measure scores may have little impact on jurors in actual SVP hearings."

The researchers surveyed 299 jurors at the end of 26 sexually violent predator trials. Unfortunately, they could not directly measure the relationship between risk scores and civil commitment decisions because, this being Texas, juries slam-dunked 25 out of 26 sex offenders, hanging in only one case (which ultimately ended in commitment after a retrial).  

Instead of the ultimate legal outcome, the researchers had to rely on proxy outcome measures, including jurors' ratings of how dangerous an individual was (specifically, how likely he would be to commit a new sex offense within one year of release), and their assessment of how difficult it was to make a decision in their case.

There was no evidence that jurors' assessments of risk or decision difficulty varied based on respondents' scores on risk assessment tools, which in each case included the Static-99, MnSOST-R and the PCL-R. This finding, by the prolific team of Marcus Boccaccini, Daniel Murrie and colleagues, extends into the real world prior mock trial evidence that jurors in capital cases and other legal proceedings involving psychology experts are more heavily influenced by clinical than actuarial testimony.

What did make a difference to jurors was whether the defense called at least one witness, and in particular an expert witness. Overall, there was a huge imbalance in expert testimony, with almost all of the trials featuring two state experts, but only seven of 26 including even one expert called by the defense.

"Skepticism effect"

The introduction of a defense expert produced a "skepticism effect," the researchers found, in which jurors became more skeptical of experts' ability to predict future offending. However, jurors' lower risk ratings in these cases could also have been due to real differences in the cases. In SVP cases involving legitimately dangerous sex offenders, defense attorneys often have trouble finding experts willing to testify. In other words, the researchers note, "the reduced ratings of perceived risk associated with the presence of a defense expert may be due to nonrandom selection … as opposed to these defense experts' influencing jurors."

A back story here pertains to the jury pool in the Texas county in which civil commitment trials are held. All SVP trials take place in Montgomery County, a "very white community," an attorney there told me. A special e-juror selection process for SVP jurors whitens the jury pool even more, disproportionately eliminating Hispanics and African Americans. Meanwhile, many of those being referred for civil commitment are racial minorities. The potentially Unconstitutional race discrepancy is the basis for one of many current legal challenges to the SVP system in Texas.

Once a petition for civil commitment as a sexually violent predator is filed in Texas, the outcome is a fait accompli. Since the inception of the state's SVP law, only one jury has unanimously voted against civil commitment. Almost 300 men have been committed, and not a single one has been released.

Overall, the broad majority of jurors in the 26 SVP trials were of the opinion that respondents were likely to reoffend in the next year. Based on this heightened perception of risk, the researchers hypothesize that jurors may have found precise risk assessment ratings irrelevant because any risk was enough to justify civil commitment.

In a previous survey of Texas jurors, more than half reported that even a 1 percent chance of recidivism was enough to qualify a sex offender as dangerous. To be civilly committed in Texas, a sex offender must be found "likely" to reoffend, but the state's courts have not clarified what that term means.  

Risk scores could also be irrelevant to jurors motivated more by a desire for retribution than a genuine wish to protect the public, the researchers pointed out. "Although SVP laws are ostensibly designed to provide treatment and protect the public, experimental research suggests that many mock jurors make civil commitment decisions based more on retributive motives - that is, the desire to punish sexual offenses—than the utilitarian goal of protecting the public…. Jurors who adopt this mindset may spend little time thinking about risk-measure scores."

All this is not to say that actuarial scores are irrelevant. They are highly influential in the decisions that take place leading up to an SVP trial, including administrative referrals for full evaluations, the opinions of the evaluators themselves as to whether an offender meets civil commitment criteria, and decisions by prosecutors as to which cases to select for trial.

"But the influence of risk scores appears to end at the point when laypersons make decisions about civilly committing a select subgroup of sexual offenders," the researchers noted.

Bottom line: Once a petition for civil commitment as a sexually violent predator is filed in Texas, it's the end of the line. The juries are ultra-punitive, and the deck is stacked, with government experts outnumbering experts called by the defense in every case. It remains unclear to what extent these results might generalize to SVP proceedings in other states with less conservative jury pools and/or more balanced proceedings.

  • The study, "Do Scores From Risk Measures Matter to Jurors?" by Marcus Boccaccini, Darrel Turner, Craig Henderson and Caroline Chevalier of Sam Houston State University and Daniel Murrie of the University of Virginia, is slated for publication in an upcoming issue of Psychology, Public Policy, and Law. To request a copy, email the lead researcher (HERE).

February 17, 2010

Forensic psychiatrist: Courts fostering "POLITICAL DIAGNOSIS"

After sex offenders, who will be next?

More leading experts are starting to notice and voice alarm over the pretextual use of psychiatric diagnoses in SVP civil commitment cases. In an editorial this week, a prominent forensic psychiatrist quotes the late Michael Crichton, calling it "bad science 'tricked out' for public policy ends."

Writing in the Psychiatric Times, James Knoll, psychiatry professor at SUNY-Syracuse and director of a forensic fellowship program, critiques both the questionable diagnoses and the shaky risk assessment techniques being used to civilly commit Sexually Violent Predators:
A variety of instruments have been developed (PCL-R, Static-99, Phallometry, Minnesota Sex Offender Screening Tool, etc.); however, these tests are often challenged in courts as not meeting legal standards for expert evidence and testimony. So while the research database has grown, the question remains: is it reliable enough to be proffered as expert testimony? Experts in the field continue to have serious reservations, and express caution about the (mis)use of these instruments for expert testimony.
Turning to the questionable diagnoses being used in SVP cases, Knoll puts the onus squarely on the U.S. Supreme Court for creating a tautological and "politico-legal" definition of "mental disorder or abnormality" for use in these civil commitment proceedings:
[T]he courts may use our diagnoses when they choose to, and they may ignore them and/or devise their own if it suits public policy…. Since it is forensic mental health professionals who are tasked with SVP evaluations, they have attempted to give this term meaning within the confines of their science. Have these attempts reached a consensus? It would appear that they have not. There continues to be substantial disagreement….

When psychiatric science becomes co-opted by a political agenda, an unhealthy alliance may be created. It is science that will always be the host organism, to be taken over by political viruses…. [P]sychiatry may come to resemble a new organism entirely -- one that serves the ends of the criminal justice system.
If we want to know where all this is headed if someone doesn't slam on the brakes, Knoll points us across the Atlantic to the United Kingdom, where offenders are indefinitely committed on the basis of a nebulous "Dangerous and Severe Personality Disorder" (DSPD):
Given the similarities between our SVP laws and the UK’s DSPD laws, is it too outrageous to speculate that a psychopathy (or DSPD-equivalent) commitment law might be on the U.S. horizon? Remember, the driving force behind such initiatives is usually only one highly publicized, egregious case away.
Related resource:

For an empirical study on the scientific problems with determining future violence under the UK's "Dangerous and Severe Personality Disorder" law, see: Ullrich, S., Yang, M., & Coid, J. (2009), "Dangerous and severe personality disorder: An investigation of the construct," International Journal of Law & Psychiatry (in press). The ultimate conclusions are strikingly similar to the issues posed by Knoll.

The study found a high rate of false positives -- that is, people categorized as DSPD and at high risk of serious reoffending when they actually did not reoffend when tracked in the community: 26 DSPD offenders would need to be civilly committed to prevent one major violent act.

When tracking sex crimes, which are of particular public concern, the researchers found that most new sex offenses were committed by offenders who were NOT categorized as DSPD, undermining the UK Home Office and Department of Health assumption that offenders at the highest risk for future sex offending would be categorized as DSPD.

After critiquing the accuracy of actuarial techniques, the article concludes:
"Bearing in mind the inaccuracy of DSPD criteria in identifying high risk individuals ... the construction of medico-legal terms, as in the case of DSPD, appears highly questionable.... [M]any determinants of violence are circumstantial and situational, and will invariably change over time, rather than related to some inherent characteristics of the perpetrator.... [F]ar more research is necessary ... before attempting to integrate a psychiatric condition into a legal system."
Heed these warnings, folks. The way things are headed in the U.S. criminal justice system, I expect to hear expansion of civil commitment to other groups -- violent offenders, juveniles, and others -- being proposed any minute now.

February 11, 2010

Skeem to give psychopathy training in Oregon

Save the date: Friday, April 9

On the heels of a hugely successful training featuring Stephen Hart of Simon Fraser University on sex offender risk assessment, Alexander Millkey and Michelle Guyton at Northwest Forensic Institute in Portland are doing it again. This time, they've scored Jennifer Skeem of the University of California at Irvine, who will provide advanced training on the controversial construct of psychopathy.

As many of you know, Dr. Skeem is an eminent scholar who has received the prestigious Saleem Shah Award for Early Career Excellence from the American Psychology-Law Society (APA Div 41) and the Distinguished Assistant Professor Award for Research at UC Irvine. She has published more than 70 scientific articles, chapters, and books, and is co-editor and an author of the excellent new book, Psychological Science in the Courtroom: Consensus and Controversy. Her research areas include psychopathy, violence risk assessment, and effective supervision strategies for individuals mandated for psychiatric care.

In this training, she will challenge prevailing assumptions that psychopathy is a unitary and homogeneous construct or something that can be reduced to a score on the Psychopathy Checklist-Revised (PCL-R). She will also present data challenging the deeply entrenched idea that people with psychopathic traits are incurable cases that should be diverted from treatment settings to environments where their behavior can merely be monitored and controlled.

The all-day training is on Friday, April 9 at Portland State University, and is followed by a networking reception with Dr. Skeem. Registrants will receive six hours of continuing education credits. The cost is only $175, or $75 for students.

For more information and to register, go to the Institute's website.

October 4, 2012

Long-awaited HCR-20 update to premiere in Scotland

The long-awaited international launch of the third version of the popular HCR-20 violence risk assessment instrument has been announced for next April in Edinburgh, Scotland.

The HCR-20 is an evidence-based tool using the structured professional judgment method, an alternative to the actuarial method that predicts violence at least as well while giving a more nuanced and individualized understanding. It has been evaluated in 32 different countries and translated into 18 languages.

A lot has changed in the world of risk prediction since the second edition premiered 15 years ago. Perhaps the major change in the third edition is the elimination of the need to incorporate a Psychopathy Checklist (PCL-R) score; research determined that this did not add to the instrument's predictive validity. Additionally, like the sister instrument for sex offender risk assessment, the RSVP, the HCR:V3 will focus more heavily on formulating plans to manage and reduce a person's risk, rather than merely predicting violence.

The revision process took four years, with beta testing in England, Holland, Sweden and Germany. Initial reports show very high correlations with the second edition of the HCR-20, excellent interrater reliability, and promising validity as a violence prediction tool.

The HCR:V3 will be launched at a one-day conference jointly organized by The Royal Society of Edinburgh and Violence Risk Assessment Training. Developers Christopher Webster, Stephen Hart and Kevin Douglas will be on hand to describe the research on the new instrument and its utility in violence risk assessment.

More information on the April 15, 2013 training conference is available HERE. A Webinar PowerPoint on the revision process is HERE.

October 8, 2013

Study: Risk tools don't work with psychopaths

If you want to know whether that psychopathic fellow sitting across the table from you will commit a violent crime within the next three years, you might as well flip a coin as use a violence risk assessment tool.

Popular risk assessment instruments such as the HCR-20 and the VRAG perform no better than chance in predicting risk among prisoners high in psychopathy, according to a new study published in the British Journal of Psychiatry. The study followed a large, high-risk sample of released male prisoners in England and Wales.

Risk assessment tools performed fairly well for men with no mental disorder. Utility was decreased for men diagnosed with schizophrenia or depression, became worse yet for those with substance abuse, and ranged from poor to no better than chance for individuals with personality disorders. But the instruments bombed completely when it came to men with high scores on the Psychopathy Checklist-Revised (PCL-R) (which, as regular readers of this blog know, has real-world validity problems all its own). 

"Our findings have major implications for risk assessment in criminal populations," noted study authors Jeremy Coid, Simone Ullrich and Constantinos Kallis. "Routine use of these risk assessment instruments will have major limitations in settings with high prevalence of severe personality disorder, such as secure psychiatric hospitals and prisons."

The study, "Predicting future violence among individuals with psychopathy," may be requested from the first author, Jeremy Coid (click HERE).  

December 2, 2008

Move over, Guantanamo - here comes Wisconsin

Lifetime detention for misconduct at age 14?

When he was 14 years old, Daniel Arends made a big mistake. He sexually assaulted an autistic boy. He was adjudicated as a delinquent.

Then, he made some other mistakes. His juvenile detention was extended several times for sexual contact with other boys.

When he turned 17, he learned just how much trouble he was in. He became the first juvenile that the state of Wisconsin sought to detain indefinitely under its "Sexually Violent Person" civil commitment law. He was committed to the Sand Ridge Secure Treatment Center in 2005, and he has remained there ever since. He is now 22.

Technically, there is a way for Daniel to get out of this potentially lifelong incarceration. All he must do is show that he has changed so that he no longer meets the legal criteria of being "more likely than not to commit a future act of sexual violence."

The Catch-22 is, how can one prove something like that from behind bars?

One potential method is through expert evidence. A psychologist, Dr. Sheila J. Fields, evaluated him, administered a series of tests, and wrote a favorable report stating that in her opinion he had indeed changed. He had successfully progressed in the treatment program and his score on the Psychopathy Checklist (PCL-R) was now lower. She noted that his last incident of criminal sexual activity occurred when he was 14, and there had been no reports of inappropriate sexual behavior since October 2003.

In her report, Dr. Fields also discussed some of the problems I have been blogging about lately, such as the difficulty of accurately predicting adult sexual recidivism from juvenile misconduct, and reliability problems with Daniel's diagnosis, Antisocial Personality Disorder.

Based on this favorable report, Daniel petitioned the court for a hearing on whether he still meets the civil commitment criteria. The government of Wisconsin, however, opposed the hearing, arguing Daniel was not entitled to it unless he could actually "prove" in his petition that his condition "had changed."

The local court agreed, and denied Daniel the right to even be heard in court. Shades of Guantanamo, right?

Daniel appealed, and the Wisconsin Appellate Court agreed with him. The standard for getting a hearing, the appellate court ruled, is whether the person has presented facts in his petition from which a judge or a jury “may” conclude that he has changed. In other words, he is not required to prove that he actually has changed just to get an evidentiary hearing.

The case will go back to the lower court for an evidentiary hearing. That does not mean Daniel will be released, though. For that, we'll have to stay tuned.

The Nov. 19 appellate ruling in State v. Arends (2008AP52) is online here. News coverage in the Journal Sentinel of Milwaukee is here.
Hat tip: Steve Erickson

November 2, 2013

RadioLab explores criminal culpability and the brain

Debate: Moral justice versus risk forecasting


After Kevin had brain surgery for his epilepsy, he developed an uncontrollable urge to download child pornography. If the surgery engendered Klüver-Bucy Syndrome, compromising his ability to control his impulses, should he be less morally culpable than another offender?

Blame is a fascinating episode of RadioLab that explores the debate over free will versus biology as destiny. Nita Farahany, professor of law and philosophy at Duke, is documenting an explosion in the use of brain science in court. But it's a slippery slope: Today, brain scanning technology only enables us to see the most obvious of physical defects, such as tumors. But one day, argues neuroscientist David Eagleman, we will be able to map the brain with sufficient focus to see that all behavior is a function of one perturbation or another.

Eagleman and guest Amy Phenix (of Static-99 fame) both think that instead of focusing on culpability, the criminal justice system should focus on risk of recidivism, as determined by statistical algorithms.

But hosts Jad and Robert express skepticism about this mechanistic approach to justice. They wonder whether a technocratic, risk-focused society is really one we want to live in.

The idea of turning legal decision-making over to a computer program is superficially alluring, promising to take prejudice and emotionality out of the equation. But the notion of scientific objectivity is illusory. Computer algorithms are nowhere near as value-neutral as their proponents claim. Implicit values are involved in choosing which factors to include in a model, humans introduce scoring bias (as I have reported previously in reference to the Static-99 and the PCL-R), and even supposedly neutral factors such as zip codes that are used in crime-forecasting software are coded markers of race and class. 

But that’s just on a technical level. On a more philosophical level, the notion that scores on various risk markers should determine an individual’s fate is not only unfair, punishing the person for acts not committed, but reflects a deeply pessimistic view of humanity. People are not just bundles of unthinking synapses. They are sentient beings, capable of change.

In addition, by placing the onus for future behavior entirely on the individual, the risk-factor-as-destiny approach conveniently removes society’s responsibility for mitigating the environmental causes of crime, and negates any hope of rehabilitation.

As discussed in an illuminating article on the Circles of Support and Accountability (or COSA) movement in Canada, former criminals face a catch-22 situation in which society refuses to reintegrate them, thereby elevating their risk of remaining alienated and ultimately reoffending. Yet when surrounded by friendship and support, former offenders are far less likely to reoffend, studies show.

The hour-long RadioLab episode  concludes with a segment on forgiveness, featuring the unlikely friendship that developed between an octogenarian and the criminal who sexually assaulted and strangled his daughter.

That provides a fitting ending. Because ultimately, as listener Molly G. from Maplewood, New Jersey, comments on the segment’s web page, justice is a moral and ethical construct. It’s not something that can, or should, be decided by scientists.

* * * * *

The episode is highly recommended. (Click HERE to listen online or download the podcast.)

January 21, 2011

How competent are the competency evaluators?

Largest real-world study finds modest agreement among independent alienists

A bad forensic report by a bad evaluator sets bad events in motion.

True story: A judge ordered a robbery suspect to undergo evaluation. A drive-by psychologist opined that the defendant was incompetent to stand trial due to schizophrenia and attention deficit/hyperactivity disorder (ADHD). The judge rubber-stamped the evaluator's opinion. The defendant was shipped off to the dysfunctional state hospital system for competency restoration treatment. There, the psychologist's diagnoses were rubber-stamped. The unruly defendant was shot full of powerful antipsychotics, given a few months of bus therapy, and proclaimed competent. The defendant had never been psychotic in the first place. Years later, he remained just as mentally retarded as ever.

"Penny-wise, pound-foolish" is the expression that comes to mind. The courts try to save money by appointing only one psychologist per case, and by paying a ludicrously small sum that encourages shoddy practices. But cleaning up the resultant messes is costly, inefficient, and fundamentally unfair.

Competency evaluations are the bread and butter of forensic work. An estimated 60,000 defendants per year -- roughly 5% of the total -- are evaluated to see whether they understand their legal situations and can rationally assist their lawyers in their defense. But for all of the importance of accurate assessments, both to a smoothly running court system and to the rights of the mentally ill to a fair trial, surprisingly little is known about the real-world accuracy of forensic evaluators.

In the case I just outlined, the judge viewed psychologists and psychiatrists as equal and interchangeable, all inherently reliable and trustworthy. At the other extreme, some believe forensic opinions are as random as a chimp with a typewriter.

Hawaii: Exemplar or exception?

Only one U.S. state squarely addresses the problem of reliability in competency evaluations. In the Aloha State, when a doubt is raised as to a defendant's competency, three separate evaluators must conduct independent evaluations. One evaluator is a state employee; the other two are independent. One must be a psychiatrist. By law, the three cannot talk with each other about the case.

This makes Hawaii the perfect setting to examine the real-world reliability of competency evaluators. In a study just accepted for publication in Law and Human Behavior, three investigators took advantage of this opportunity to conduct the largest naturalistic study ever of evaluators' agreement about competency to stand trial.

It should not be a surprise that Daniel Murrie and Marcus Boccaccini are two of the investigators. Not the types to run Psych 101 undergrads through artificial lab experiments, these two are committed to examining forensic practice in the courtroom trenches. I've blogged about their previous work exposing "partisan allegiance" effects in the real-world application of the Psychopathy Checklist (PCL-R). For the current innovative study, they teamed up with W. Neil Gowensmith of the Hawaii courts' forensic services unit.

Examining 729 reports authored by 35 evaluators, they found that all three evaluators agreed in just under three out of four -- or 71 percent -- of initial competency referrals. Agreement was a bit lower -- 61 percent -- in cases where defendants were being reevaluated after undergoing competency restoration treatment.

Consistent with the results of a hot-off-the-press meta-analysis of 50 years of competency research, evaluators believed that the broad majority of defendants referred for evaluation, about 73 percent, were competent to stand trial. This figure was somewhat lower for defendants being reevaluated after an initial finding of competency, with evaluators opining competence in about half of such restoration cases.

Why do evaluators differ?

As far as why agreement is not higher, the study raised more questions than it answered. The researchers sifted through the data looking for patterns, but none jumped out. Evaluators did not lean one way or the other by discipline (psychologist vs. psychiatrist) or by employer (state versus private practice). Defendant demographics were not explanatory. Nor were evaluator disagreements about diagnosis.

It would be interesting to conduct qualitative analyses of the 216 cases in this study to see whether those in which evaluators differed were more complex and ambiguous than the others. I suspect that to be the case.

Competency is nebulous. It exists along a continuum, so there is no precise cut point at which a defendant is automatically "competent" or "incompetent" to go forward with his legal case. Thus, evaluator agreement will never be perfect, nor -- necessarily -- should it be.

How did the judges rule?

One of the more intriguing aspects of the study was its exposition of how judges ruled after being presented with three reports. Not surprisingly, when evaluators were unanimous or split 2-1, the judges tended to go with the majority. But unlike the judge in the vignette I described earlier, many Hawaiian judges were independent thinkers who did not just rubber-stamp the evaluators' opinions.

When they disagreed with the opinions of the court-appointed psychologists and psychiatrists, it was typically to find a defendant incompetent. In fact, in a few cases the judges found defendants to be incompetent even when all three evaluators believed a defendant was competent. In this way, they elevated defendants' due-process rights over prosecutorial efficiency. But maybe that's just Hawaii.

Moving forward

I found the results somewhat encouraging. When not subjected to partisan allegiance pressures, forensic practitioners agreed about three-fourths of the time about whether a defendant was competent to stand trial or not.

Still, if these results are generalizable, it means evaluators will disagree in about two or three cases out of every ten. So in jurisdictions that appoint only a single evaluator, the researchers point out, many judges may be unwittingly rubber-stamping an idiosyncratic -- and even patently incorrect -- opinion:
[T]o the extent that there is a factually correct answer to the question of whether or not a defendant is competent to stand trial, relying on one evaluator increases the likelihood that the court reaches an incorrect decision (by following an incorrect single opinion that would have been revealed as a minority opinion if other evaluations were available). In some instances, this may result in delaying a trial while a defendant is unnecessarily hospitalized. In other instances this may result in a defendant proceeding to trial when additional evaluator(s) would have opined the defendant was unable to participate meaningfully in that trial….

The justice system needs to continue to wrestle with how to handle these competing demands -- efficient use of resources versus fair consideration of defendants' right to due process.
Murrie and Boccaccini are on a roll. Let's hope they keep churning out this ground-breaking line of research, examining the real-world vagaries of forensic practice, and that others climb down from the ivory towers and jump on their bandwagon.

As they note, "naturalistic studies of field reliability are an essential first step in gauging wide-scale quality across all manner of forensic practice and targeting areas for improvement."

January 12, 2014

Putting the Cart Before the Horse: The Forensic Application of the SRA-FV

As the developers of actuarial instruments such as the Static-99R acknowledge that their original norms inflated the risk of re-offense for sex offenders, a brand-new method is cropping up to preserve those inflated risk estimates in sexually violent predator civil commitment trials. The method introduces a new instrument, the “SRA-FV,” in order to bootstrap special “high-risk” norms on the Static-99R. Curious about the scientific support for this novel approach, I asked forensic psychologist and statistics expert Brian Abbott to weigh in.

Guest post by Brian Abbott, PhD*

NEWS FLASH: Results from the first peer-reviewed study about the Structured Risk Assessment: Forensic Version (“SRA-FV”), published in Sexual Abuse: Journal of Research and Treatment (“SAJRT”), demonstrate the instrument is not all that it’s cracked up to be.
Promotional material for an SRA-FV training
For the past three years, the SRA-FV developer has promoted the instrument for clinical and forensic use despite the absence of peer-reviewed, published research supporting it validity, reliability, and generalizability. Accordingly, some clinicians who have attended SRA-FV trainings around the country routinely apply the SRA-FV in sexually violent predator risk assessments and testify about its results in court as if the instrument has been proven to measure what it intends to assess, has known error rates, retains validity when applied to other groups of sexual offenders, and produces trustworthy results.

Illustrating this rush to acceptance most starkly, within just three months of its informal release (February 2011) and with an absence of any peer-reviewed research, the state of California incredibly decided to adopt the SRA-FV as its statewide mandated dynamic risk measure for assessing sexual offenders in the criminal justice system. This decision was rescinded in September 2013, with the SRA-FV replaced with a similar instrument, the Stable-2007.

The SRA-FV consists of 10 items that purportedly measure “long-term vulnerabilities” associated with sexual recidivism risk. The items are distributed among three risk domains and are assessed using either standardized rating criteria devised by the developer or by scoring certain items on the Psychopathy Checklist-Revised (PCL-R). Scores on the SRA-FV range from zero to six. Some examples of the items from the instrument include: sexual interest in children, lack of emotionally intimate relationships with adults, callousness, and internal grievance thinking. Patients from the Massachusetts Treatment Center in Bridgewater, Massachusetts who were evaluated as sexually dangerous persons between 1959 and 1984 served as members of the SRA-FV construction group (unknown number) and validation sample (N = 418). It was released for use by Dr. David Thornton, a co-developer of the Static-99R, Static-2002R, and SRA-FV and research director at the SVP treatment program in Wisconsin, in December 2010 during training held in Atascadero, California. Since then, Dr. Thornton has held similar trainings around the nation where he asserts that the SRA-FV is valid for predicting sexual recidivism risk, achieves incremental validity over the Static-99R, and can be used to choose among Static-99R reference groups.

A primary focus of the trainings is a novel system in which the total score on the SRA-FV is used to select one Static-99R “reference group” among three available options. The developer describes the statistical modeling underlying this procedure, which he claims increases predictive validity and power over using the Static-99R alone. However, reliability data is not offered to support this claim. In the December 2010 training, several colleagues and I asked for the inter-rater agreement rate but Dr. Thornton refused to provide it.

I was astounded but not surprised when some government evaluators in California started to apply the SRA-FV in sexually violent predator risk assessments within 30 days after the December 2010 training. This trend blossomed in other jurisdictions with sexually violent predator civil confinement laws. Typically, government evaluators applied the SRA-FV to select Static-99R reference groups, invariably choosing to compare offenders with the “High Risk High Needs” sample with the highest re-offense rates. A minority of clinicians stated in reports and court testimony that the SRA-FV increased predictive accuracy over the Static-99R alone but they were unable to quantify this effect. The same clinicians have argued that the pending publication of the Thornton and Knight study was sufficient to justify its use in civil confinement risk assessments for sexually violent predators. They appeared to imply that the mere fact that a construction and validation study had been accepted for publication was an imprimatur that the instrument was reliable and valid for its intended purposes. Now that the research has been peer-reviewed and published, the results reflect that these government evaluators apparently put the proverbial cart before the horse.

David Thornton and Raymond Knight penned an article that documents the construction and validation of the SRA-FV. The publication is a step in the right direction, but by no means do the results justify widespread application of the SRA-FV in sexual offender risk assessment in general or sexually violent predator proceedings in particular. Rather, the results of the study only apply to the group upon which the research was conducted and do not generalize to other groups of sexual offenders. Before discussing the limitations of the research, I would like to point out some encouraging results.

The SRA-FV did, as its developer claimed, account for more sources of sexual recidivism risk than the Static-99R alone. However, it remains unknown which of the SRA-FV’s ten items contribute to risk prediction. The study also found that the combination of the Static-99R and SRA-FV increased predictive power. This improved predictive accuracy, however, must be replicated to determine whether the combination of the two instruments will perform similarly in other groups of sexual offenders. This is especially important when considering that the SRA-FV was constructed and validated on individuals from the Bridgewater sample from Massachusetts who are not representative of contemporary groups of sexual offenders. Thornton and Knight concede this point when discussing how the management of sexual offenders through all levels of the criminal justice system in Massachusetts between 1959 and 1984 was remarkably lenient compared to contemporary times. Such historical artifacts likely compromise any reliable generalization from patients at Bridgewater to present-day sexual offenders.

Training materials presented four months before
State of California rescinded use of the SRA-FV

Probably the most crucial finding from the study is the SRA-FV’s poor inter-rater reliability. The authors categorize the 64 percent rate of agreement as “fair.” It is well known that inter-rater agreement in research studies is typically higher than in real-world applications. This has been addressed previously in this blog in regard to the PCL-R. A field reliability study of the SRA-FV among 19 government psychologists rating 69 sexually violent predators in Wisconsin (Sachsenmaier, Thornton, & Olson, 2011) found an inter-rater agreement rate of only 55 percent for the SRA-FV total score, which is considered as poor reliability. These data illustrate that 36 percent to 45 percent of an SRA-FV score constitutes error, raising serious concerns over the trustworthiness of the instrument. To their credit, Thornton and Knight acknowledge this as an issue and note that steps should be taken to increase reliable scoring. Nonetheless, the current inter-rater reliability falls far short of the 80 percent floor recommended for forensic practice (Heilbrun, 1992). Unless steps are taken to dramatically improve reliability, the claims that the SRA-FV increases predictive accuracy either alone or in combination with the Static-99R, and that it should be used to select Static-99R reference groups, are moot.

It is also important to note that, although Thornton and Knight confuse the terms validation and cross validation in their article, this study represents a validation methodology. Cross-validation is a process by which the statistical properties found in a validation sample (such as reliability, validity, and item correlations) are tested in a separate group to see whether they hold up. In contrast, Thornton and Knight first considered the available research data from a small number of individuals from the Bridgewater group to determine what items would be included in the SRA-FV. This group is referred to as the construction sample. The statistical properties of the newly conceived measure were studied on 418 Bridgewater patients who constitute the validation sample. The psychometric properties of the validation group have not been tested on other contemporary sexual offender groups. Absent such cross-validation studies, we simply have no confidence that the SRA-FV works at it has been designed for groups other than the sample upon which it was validated. To their credit, Thornton and Knight acknowledge this limitation and warn readers not to generalize the validation research to contemporary groups of sexual offenders.

The data on incremental predictive validity, while interesting, have little practical value at this point for two reasons. One, it is unknown whether the results will replicate in contemporary groups of sexual offenders. Two, no data are provided to quantify the increased predictive power. The study does not provide an experience table of probability estimates at each score on the Static-99R after taking into account the effect of the SRA-FV scores. It seems disingenuous, if not misleading, to inform the trier of fact that the combined measures increase predictive power but to fail to quantify the result and the associated error rate.

In my practice, I have seen the SRA-FV used most often to select among three Static-99R reference groups. Invariably, government evaluators in sexually violent predator risk assessments assign SRA-FV total scores consistent with the selection of the Static-99R High Risk High Needs reference group. Only the risk estimates associated with the highest Static-99R scores in this reference group are sufficient to support an opinion that an individual meets the statutory level of sexual dangerousness necessary to justify civil confinement. Government evaluators who have used the SRA-FV for this purpose cannot cite research demonstrating that the procedure works as intended or that it produces a reliable match to the group representing the individual being assessed. Unfortunately, Thornton and Knight are silent on this application of the SRA-FV.

In a recently published article, I tested the use of the SRA-FV for selecting Static-99R reference groups. In brief, Dr. Thornton used statistical modeling based solely on data from the Bridgewater sample to devise this model. The reference group selection method was not based on the actual scores of members from each of the three reference groups. Rather, it was hypothetical, presuming that members of a Static-99R reference group will exhibit a certain range of SRA-FV score that do not overlap with any of the other two reference groups. To the contrary, I found that the hypothetical SRA-FV reference group system did not work as designed, as the SRA-FV scores between reference groups overlapped by wide margins. In other words, the SRA-FV total score would likely be consistent with selecting two if not all three Static-99R reference groups. In light of these findings, it is incumbent upon the developer to provide research using actual subjects to prove that the SRA-FV total score is a valid method by which to select a single Static-99R reference group and that the procedure can be applied reliably. At this point, credible support does not exist for using the SRA-FV to select Static-99R reference groups.

The design, development, validation, and replication of psychological instruments is guided by the Standard for Educational and Psychological Testing (“SEPT” -- American Educational Research Association et al., 1999). When comparing the Thornton and Knight study to the framework provided by SEPT, it is apparent the SRA-FV is in the infancy stage of development. At best, the SRA-FV is a work in progress that needs substantially more research to improve its psychometric properties. Aside from its low reliability and inability to generalize the validation research to other groups of sexual offenders, other important statistical properties await examination, including but not limited to:

  1. standard error of measurement
  2. factor analysis of whether items within each of the three risk domains significantly load in their respective domains
  3. the extent of the correlation between each SRA-FV item and sexual recidivism
  4. which SRA-FV items add incremental validity beyond the Static-99R or may be redundant with it; and proving each item has construct validity. 

It is reasonable to conclude that at its current stage of development the use of the SRA-FV in forensic proceedings is premature and scientifically indefensible. In closing , in their eagerness to improve the accuracy of their risk assessments, clinicians relied upon Dr. Thornton’s claim in the absence of peer-reviewed research demonstrating that the SRA-FV achieved generally accepted levels of reliability and validity. The history of forensic evaluators deploying the SRA-FV before the publication of the construction and validation study raises significant ethical and legal questions:

  • Should clinicians be accountable to vet the research presented in trainings by an instrument’s developer before applying a tool in forensic practice? 

  • What responsibility do clinicians have to rectify testimony where they presented the SRA-FV as if the results were reliable and valid?

  •  How many individuals have been civilly committed as sexually violent predators based on testimony that the findings from the SRA-FV were consistent with individuals meeting the legal threshold for sexual dangerousness, when the published data does not support this conclusion?

Answers to these questions and others go beyond the scope of this blog. However, in a recent appellate decision, a Washington Appeals Court questions the admissibility of the SRA-FV in the civil confinement trial of Steven Ritter. The appellate court determined that the application of the SRA-FV was critical to the government evaluator’s opinion that Mr. Ritter met the statutory threshold for sexual dangerousness. Since the SRA-FV is considered a novel scientific procedure, the appeals court reasoned that the trial court erred by not holding a defense-requested evidentiary hearing to decide whether the SRA-FV was admissible evidence for the jury to hear. The appeals court remanded the issue to the trial court to hold a Kelly-Frye hearing on the SRA-FV. Stay tuned!

References

Abbott, B.R. (2013). The Utility of Assessing “External Risk Factors” When Selecting Static-99R Reference Groups. Open Access Journal of Forensic Psychology, 5, 89-118.

American Educational Research Association, American Psychological Association and National Council on Measurement in Education. (1999). Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association.

Heilbrun, K. (1992). The role of psychological testing in forensic assessment. Law and Human Behavior, 16, 257-272. doi: 10.1007/BF01044769.

In Re the Detention of Steven Ritter. (2013, November). In the Appeals Court of the State of Washington, Division III. 

Sachsenmaier, S., Thornton, D., & Olson, G. (2011, November). Structured risk assessment forensic version (SRA-FV): Score distribution, inter-rater reliability, and margin of error in an SVP population. Presentation at the 30th Annual Research and Treatment Conference of the Association for the Treatment of Sexual Abusers, Toronto, Canada.

Thornton, D. & Knight, R.A. (2013). Construction and validation of the SRA-FV Need Assessment. Sexual Abuse: A Journal of Research and Treatment. Published online December 30, 2013. doi: 10.1177/ 1079063213511120. 
* * *


*Brian R. Abbott is licensed psychologist in California and Washington who has evaluated and treated sexual offenders for more than 35 years. Among his areas of forensic expertise, Dr. Abbott has worked with sexually violent predators in various jurisdictions within the United States, where he performs psychological examinations, trains professionals, consults on psychological and legal issues, offers expert testimony, and publishes papers and peer-reviewed articles.



(c) Copyright 2013 - All rights reserved