Showing posts sorted by date for query PCL-R. Sort by relevance Show all posts
Showing posts sorted by date for query PCL-R. Sort by relevance Show all posts

March 5, 2013

Remarkable experiment proves pull of adversarial allegiance

 Psychologists' scoring of forensic tools depends on which side they believe has hired them

A brilliant experiment has proven that adversarial pressures skew forensic psychologists' scoring of supposedly objective risk assessment tests, and that this "adversarial allegiance" is not due to selection bias, or preexisting differences among evaluators.

The researchers duped about 100 experienced forensic psychologists into believing they were part of a large-scale forensic case consultation at the behest of either a public defender service or a specialized prosecution unit. After two days of formal training by recognized experts on two widely used forensic instruments -- the Psychopathy Checklist-R (PCL-R) and the Static-99R -- the psychologists were paid $400 to spend a third day reviewing cases and scoring subjects. The National Science Foundation picked up the $40,000 tab.

Unbeknownst to them, the psychologists were all looking at the same set of four cases. But they were "primed" to consider the case from either a defense or prosecution point of view by a research confederate, an actual attorney who pretended to work on a Sexually Violent Predator (SVP) unit. In his defense attorney guise, the confederate made mildly partisan but realistic statements such as "We try to help the court understand that ... not every sex offender really poses a high risk of reoffending." In his prosecutor role, he said, "We try to help the court understand that the offenders we bring to trial are a select group [who] are more likely than other sex offenders to reoffend." In both conditions, he hinted at future work opportunities if the consultation went well. 

The deception was so cunning that only four astute participants smelled a rat; their data were discarded.

As expected, the adversarial allegiance effect was stronger for the PCL-R, which is more subjectively scored. (Evaluators must decide, for example, whether a subject is "glib" or "superficially charming.") Scoring differences on the Static-99R only reached statistical significance in one out of the four cases.

The groundbreaking research, to be published in the journal Psychological Science, echoes previous findings by the same group regarding partisan bias in actual court cases. But by conducting a true experiment in which participants were randomly assigned to either a defense or prosecution condition, the researchers could rule out selection bias as a cause. In other words, the adversarial allegiance bias cannot be solely due to attorneys shopping around for simpatico experts, as the experimental participants were randomly assigned and had no group differences in their attitudes about civil commitment laws for sex offenders.

Sexually Violent Predator cases are an excellent arena for studying adversarial allegiance, because the typical case boils down to a "battle of the experts." Often, the only witnesses are psychologists, all of whom have reviewed essentially the same material but have differing interpretations about mental disorder and risk. In actual cases, the researchers note, the adversarial pressures are far higher than in this experiment:
"This evidence of allegiance was particularly striking because our experimental manipulation was less powerful than experts are likely to encounter in most real cases. For example, our participating experts spent only 15 minutes with the retaining attorney, whereas experts in the field may have extensive contact with retaining attorneys over weeks or months. Our experts formed opinions based on files only, which were identical across opposing experts. But experts in the field may elicit different information by seeking different collateral sources or interviewing offenders in different ways. Therefore, the pull toward allegiance in this study was relatively weak compared to the pull typical of most cases in the field. So the large group differences provide compelling evidence for adversarial allegiance."

This is just the latest in a series of stunning findings by this team of psychologists led by Daniel Murrie of the University of Virginia and Marcus Boccaccini of Sam Houston University on an allegiance bias among psychologists. The tendency of experts to skew data to fit the side who retains them should come as no big surprise. After all, it is consistent with 2009 findings by the National Academies of Science calling into question the reliability of all types of forensic science evidence, including supposedly more objective techniques such as DNA typing and fingerprint analysis.

Although the group's findings have heretofore been published only in academic journals and have found a limited audience outside of the profession, this might change. A Huffington Post blogger, Wray Herbert, has published a piece on the current findings, which he called "disturbing." And I predict more public interest if and when mainstream journalists and science writers learn of this extraordinary line of research.

In the latest study, Murrie and Boccaccini conducted follow-up analyses to determine how often matched pairs of experts differed in the expected direction. On the three cases in which clear allegiance effects showed up in PCL-R scoring, more than one-fourth of score pairings had differences of more than six points in the expected direction. Six points equates to about two standard errors of measurement (SEM's), which should  happen by chance in only 2 percent of cases. A similar, albeit milder, effect was found with the Static-99R.

Adversarial allegiance effects might be even stronger in less structured assessment contexts, the researchers warn. For example, clinical diagnoses and assessments of emotional injuries involve even more subjective judgment than scoring of the Static-99 or PCL-R.

But ... WHICH psychologists?!


For me, this study raised a tantalizing question: Since only some of the psychologists succumbed to the allegiance effect, what distinguished those who were swayed by the partisan pressures from those who were not?

The short answer is, "Who knows?"

The researchers told me that they ran all kinds of post-hoc analyses in an effort to answer this question, and could not find a smoking gun. As in a previous research project that I blogged about, they did find evidence for individual differences in scoring of the PCL-R, with some evaluators assigning higher scores than others across all cases. However, they found nothing about individual evaluators that would explain susceptibility to adversarial allegiance. Likewise, the allegiance effect could not be attributed to a handful of grossly biased experts in the mix.

In fact, although score differences tended to go in the expected direction -- with prosecution experts giving higher scores than defense experts on both instruments -- there was a lot of variation even among the experts on the same side, and plenty of overlap between experts on opposing sides.

So, on average prosecution experts scored the PCL-R about three points higher than did the defense experts. But the scores given by experts on any given case ranged widely even within the same group. For example, in one case, prosecution experts gave PCL-R scores ranging from about 12 to 35 (out of a total of 40 possible points), with a similarly wide range among defense experts, from about 17 to 34 points. There was quite a bit of variability on scoring of the Static-99R, too; on one of the four cases, scores ranged all the way from a low of two to a high of ten (the maximum score being 12).

When the researchers debriefed the participants themselves, they didn't have a clue as to what caused the effect. That's likely because bias is mostly unconscious, and people tend to recognize it in others but not in themselves. So, when asked about factors that make psychologists vulnerable to allegiance effects, the participants endorsed things that applied to others and not to them: Those who worked at state facilities thought private practitioners were more vulnerable; experienced evaluators thought that inexperience was the culprit. (It wasn't.)

I tend to think that greater training in how to avoid falling prey to cognitive biases (see my previous post exploring this) could make a difference. But this may be wrong; the experiment to test my hypothesis has not been run. 

The study is: "Are forensic experts biased by the side that retained them?" by Daniel C. Murrie, Marcus T. Boccaccini, Lucy A. Guarnera and Katrina Rufino, forthcoming from Psychological Science. Contact the first author (HERE) if you would like to be put on the list to receive a copy of the article as soon as it becomes available.

Click on these links for lists of my numerous prior blog posts on the PCL-R, adversarial allegiance, and other creative research by Murrie, Boccaccini and their prolific team. Among my all-time favorite experiments from this research team is: "Psychopathy: A Rorschach test for pychologists?"

February 5, 2013

Texas SVP jurors ignoring actuarial risk scores

Expert witness for defense makes a (small) difference, study finds

The fiery debates surrounding the validity of actuarial tools to predict violence risk begs the question: How much influence do these instruments really have on legal decision-makers? The answer, at least when it comes to jurors in Sexually Violent Predator trials in Texas:

Not much.

"Despite great academic emphasis on risk measures - and ongoing debates about the value, accuracy, and utility of risk-measure scores reported in SVP hearings - our findings suggest these risk measure scores may have little impact on jurors in actual SVP hearings."

The researchers surveyed 299 jurors at the end of 26 sexually violent predator trials. Unfortunately, they could not directly measure the relationship between risk scores and civil commitment decisions because, this being Texas, juries slam-dunked 25 out of 26 sex offenders, hanging in only one case (which ultimately ended in commitment after a retrial).  

Instead of the ultimate legal outcome, the researchers had to rely on proxy outcome measures, including jurors' ratings of how dangerous an individual was (specifically, how likely he would be to commit a new sex offense within one year of release), and their assessment of how difficult it was to make a decision in their case.

There was no evidence that jurors' assessments of risk or decision difficulty varied based on respondents' scores on risk assessment tools, which in each case included the Static-99, MnSOST-R and the PCL-R. This finding, by the prolific team of Marcus Boccaccini, Daniel Murrie and colleagues, extends into the real world prior mock trial evidence that jurors in capital cases and other legal proceedings involving psychology experts are more heavily influenced by clinical than actuarial testimony.

What did make a difference to jurors was whether the defense called at least one witness, and in particular an expert witness. Overall, there was a huge imbalance in expert testimony, with almost all of the trials featuring two state experts, but only seven of 26 including even one expert called by the defense.

"Skepticism effect"

The introduction of a defense expert produced a "skepticism effect," the researchers found, in which jurors became more skeptical of experts' ability to predict future offending. However, jurors' lower risk ratings in these cases could also have been due to real differences in the cases. In SVP cases involving legitimately dangerous sex offenders, defense attorneys often have trouble finding experts willing to testify. In other words, the researchers note, "the reduced ratings of perceived risk associated with the presence of a defense expert may be due to nonrandom selection … as opposed to these defense experts' influencing jurors."

A back story here pertains to the jury pool in the Texas county in which civil commitment trials are held. All SVP trials take place in Montgomery County, a "very white community," an attorney there told me. A special e-juror selection process for SVP jurors whitens the jury pool even more, disproportionately eliminating Hispanics and African Americans. Meanwhile, many of those being referred for civil commitment are racial minorities. The potentially Unconstitutional race discrepancy is the basis for one of many current legal challenges to the SVP system in Texas.

Once a petition for civil commitment as a sexually violent predator is filed in Texas, the outcome is a fait accompli. Since the inception of the state's SVP law, only one jury has unanimously voted against civil commitment. Almost 300 men have been committed, and not a single one has been released.

Overall, the broad majority of jurors in the 26 SVP trials were of the opinion that respondents were likely to reoffend in the next year. Based on this heightened perception of risk, the researchers hypothesize that jurors may have found precise risk assessment ratings irrelevant because any risk was enough to justify civil commitment.

In a previous survey of Texas jurors, more than half reported that even a 1 percent chance of recidivism was enough to qualify a sex offender as dangerous. To be civilly committed in Texas, a sex offender must be found "likely" to reoffend, but the state's courts have not clarified what that term means.  

Risk scores could also be irrelevant to jurors motivated more by a desire for retribution than a genuine wish to protect the public, the researchers pointed out. "Although SVP laws are ostensibly designed to provide treatment and protect the public, experimental research suggests that many mock jurors make civil commitment decisions based more on retributive motives - that is, the desire to punish sexual offenses—than the utilitarian goal of protecting the public…. Jurors who adopt this mindset may spend little time thinking about risk-measure scores."

All this is not to say that actuarial scores are irrelevant. They are highly influential in the decisions that take place leading up to an SVP trial, including administrative referrals for full evaluations, the opinions of the evaluators themselves as to whether an offender meets civil commitment criteria, and decisions by prosecutors as to which cases to select for trial.

"But the influence of risk scores appears to end at the point when laypersons make decisions about civilly committing a select subgroup of sexual offenders," the researchers noted.

Bottom line: Once a petition for civil commitment as a sexually violent predator is filed in Texas, it's the end of the line. The juries are ultra-punitive, and the deck is stacked, with government experts outnumbering experts called by the defense in every case. It remains unclear to what extent these results might generalize to SVP proceedings in other states with less conservative jury pools and/or more balanced proceedings.

  • The study, "Do Scores From Risk Measures Matter to Jurors?" by Marcus Boccaccini, Darrel Turner, Craig Henderson and Caroline Chevalier of Sam Houston State University and Daniel Murrie of the University of Virginia, is slated for publication in an upcoming issue of Psychology, Public Policy, and Law. To request a copy, email the lead researcher (HERE).

October 4, 2012

Long-awaited HCR-20 update to premiere in Scotland

The long-awaited international launch of the third version of the popular HCR-20 violence risk assessment instrument has been announced for next April in Edinburgh, Scotland.

The HCR-20 is an evidence-based tool using the structured professional judgment method, an alternative to the actuarial method that predicts violence at least as well while giving a more nuanced and individualized understanding. It has been evaluated in 32 different countries and translated into 18 languages.

A lot has changed in the world of risk prediction since the second edition premiered 15 years ago. Perhaps the major change in the third edition is the elimination of the need to incorporate a Psychopathy Checklist (PCL-R) score; research determined that this did not add to the instrument's predictive validity. Additionally, like the sister instrument for sex offender risk assessment, the RSVP, the HCR:V3 will focus more heavily on formulating plans to manage and reduce a person's risk, rather than merely predicting violence.

The revision process took four years, with beta testing in England, Holland, Sweden and Germany. Initial reports show very high correlations with the second edition of the HCR-20, excellent interrater reliability, and promising validity as a violence prediction tool.

The HCR:V3 will be launched at a one-day conference jointly organized by The Royal Society of Edinburgh and Violence Risk Assessment Training. Developers Christopher Webster, Stephen Hart and Kevin Douglas will be on hand to describe the research on the new instrument and its utility in violence risk assessment.

More information on the April 15, 2013 training conference is available HERE. A Webinar PowerPoint on the revision process is HERE.

August 2, 2012

Violence risk instruments overpredicting danger

Tools better at screening for low risk than pinpointing high risk 


The team of Seena Fazel and Jay Singh are at it again, bringing us yet another gigantic review of studies on the accuracy of the most widely used instruments for assessing risk of violence and sexual recidivism.


This time, the prolific researchers -- joined by UK statistician Helen Doll and Swedish professor Martin Grann -- report on a total of 73 research samples comprising 24,847 people from 13 countries. Cumulatively, the samples had a high base rate of reoffense, with almost one in four reoffending over an average of about four years.

Bottom line: Risk assessment instruments are fairly good at identifying low risk individuals, but their high rates of false positives -- people falsely flagged as recidivists -- make them inappropriate “as sole determinants of detention, sentencing, and release.”

In all, about four out of ten of those individuals judged to be at moderate to high risk of future violence went on to violently offend. Prediction of sexual reoffense was even poorer, with less than one out of four of those judged to be at moderate to high risk going on to sexually offend. In samples with lower base rates, the researchers pointed out, predictive accuracy will be even poorer.

What that means, in practical terms, is that to stop one person who will go on to become violent again in the future, society must lock up at minimum one person who will NOT; for sex offenders, at least three non-recidivists must be detained for every recidivist. This, of course, is problematic from a human rights standpoint. 

Another key finding that goes against conventional wisdom was that actuarial instruments that focus on historical risk factors perform no better than tools based on clinical judgment, a finding contrary to some previous review.

The researchers included the nine most commonly used risk assessment tools, out of the many dozens that have now been developed around the world:
  • Level of Service Inventory-Revised (LSI-R) 
  • Psychopathy Checklist-Revised (PCL-R) 
  • Sex Offender Risk Appraisal Guide (SORAG) 
  • Static-99 
  • Violence Risk Appraisal Guide (VRAG) 
  • Historical, Clinical, Risk management-20 (HCR-20) 
  • Sexual Violence Risk-20 (SVR-20) 
  • Spousal Assault Risk Assessment (SARA) 
  • Structured Assessment of Violence Risk in Youth (SAVRY) 
Team leader Fazel, of Oxford University, and colleagues stressed several key implications of their findings:
One implication of these findings is that, even after 30 years of development, the view that violence, sexual, or criminal risk can be predicted in most cases is not evidence based. This message is important for the general public, media, and some administrations who may have unrealistic expectations of risk prediction for clinicians. 

A second and related implication is that these tools are not sufficient on their own for the purposes of risk assessment. In some criminal justice systems, expert testimony commonly uses scores from these instruments in a simplistic way to estimate an individual’s risk of serious repeat offending. However, our review suggests that risk assessment tools in their current form can only be used to roughly classify individuals at the group level, and not to safely determine criminal prognosis in an individual case. 

Finally, our review suggests that these instruments should be used differently. Since they had higher negative predictive values, one potential approach would be to use them to screen out low risk individuals. Researchers and policy makers could use the number safely discharged to determine the potential screening use of any particular tool, although its use could be limited for clinicians depending on the immediate and service consequences of false positives. 

A further caveat is that specificities were not high -- therefore, although the decision maker can be confident that a person is truly low risk if screened out, when someone fails to be screened out as low risk, doctors cannot be certain that this person is not low risk. In other words, many individuals assessed as being at moderate or high risk could be, in fact, low risk. 

My blog post on these researchers' previous meta-analytic study, Violence risk meta-meta: Instrument choice does matter, is HERE.

May 29, 2012

SVP risk tools show 'disappointing' reliability in real-world use

Rater agreement on three instruments commonly used  to assess sex offenders' risk of recidivism is much lower in practice than reported in the tools' manuals, according to a new study out of Florida.

Faring most poorly was the Psychopathy Checklist (PCL-R). Correlations of scores between two evaluators hired by the same agency were in the low range. On average, psychologists differed by five points on the instrument, which has a score range of of zero to 40. In one case, two evaluators were apart by a whopping 24 points!

Agreement among evaluators was only moderate on the Static-99 and the MnSOST-R, two actuarial risk assessment instruments for which scoring is relatively more straightforward.

The study, published in the respected journal Psychological Assessment, was a collaboration between scholars from the Department of Mental Health Law and Policy at the University of South Florida and researchers with the Florida Department of Children and Families. It utilized archived records culled from the almost 35,000 individuals screened for possible Sexually Violent Predators (SVP) civil commitment in Florida between 1999 and 2009. The researchers located 315 cases in which the same individual was evaluated by separate clinicians who each administered both the PCL-R and at least one of the two actuarial measures within a short enough time frame to enable direct scoring comparisons.

It would be a mistake to lean too heavily on the results of a single isolated study. But the present study adds to a burgeoning body from several groups of independent researchers, all pointing to troubling problems with the accuracy of instruments designed to forecast risk of recidivism among sex offenders.

Related study: Psychopathy and sexual deviance not predictive

Collectively, the research has been especially critical of the ability of the highly prejudicial construct of psychopathy to add meaningfully to risk prediction in this high-stakes arena. Indeed, just this week another study has come out indicating that neither psychopathy scores nor sexual deviance measures improve on the accuracy provided by an actuarial instrument alone.

An especially interesting finding of that Canadian study is that reoffense rates were still below 12 percent over a 6-year followup period for even the most high-risk offenders -- those with high risk ratings on the Static-99R plus high levels of psychopathy and sexual deviance (as measured by phallometric testing). This makes it inappropriate to inflate risk estimates over and above those derived from Static-99R scores alone, the authors caution.

Item-level analysis finds varying rates of accuracy

A unique contribution of the Florida study is its analysis of the relative accuracy of every single item in each of the three instruments studied. Handy tables allow a forensic practitioner to see which items have the poorest reliability, meaning they should be viewed skeptically by forensic decision-makers.

For example, take the MnSOST-R, a now-defunct instrument with a score range of –14 to 31 points. The total gap between evaluators was as wide as 19 points; the items with the greatest variability in scoring were those pertaining to offenders' functioning during incarceration, such as participation in treatment.

Meanwhile, the weak performance of the Psychopathy Checklist owes much to the items on its so-called “Factor 1,” which attempt to measure the personality style of the psychopath. As I've discussed before, rating someone as “glib,” “callous” or “shallow” is a highly subjective enterprise that opens the door to a veritable avalanche of personal bias.

Piggy-backing off a recommendation by John Edens and colleagues, the Florida team suggests that the prejudicial deployment of the Psychopathy Checklist may be superfluous, in that scores on Factor 2 alone (the items reflecting a chronic criminal lifestyle) are more predictive of future violence or sexual recidivism.

Next up, we need to identify the causes of the poor interrater reliability for forensic risk prediction instruments in real-world settings. Is it due to inadequate training, differing clinical skills, variable access to collateral data, intentional or unintentional bias on the part of examiners, adversarial allegiance effects (not a factor in the present study, since both evaluators were appointed by the same agency), or some combination?

In the meantime, the fact that two evaluators working on the same side cannot reliably arrive at the same risk rating for any particular individual should certainly raise our skepticism about the validity of risk prediction based on these instruments.

The studies are:

Reliability of Risk Assessment Measures Used in Sexually Violent Predator Proceedings. Cailey Miller, Eva Kimonis, Randy Otto, Suzonne Kline and Adam Wasserman. Psychological Assessment. Published online 7 May 2012. Advance online publication. Click HERE to contact the authors.

Does Consideration of Psychopathy and Sexual Deviance Add to the Predictive Validity of the Static-99R? Jan Looman, Nicola A. C. Morphett and Jeff Abracen. International Journal of Offender Therapy and Comparative Criminology. Published online 28 May 2012. Click HERE to contact the authors.


Related blog posts: 

October 30, 2011

Study: Psychopathy score fails to predict sexual recidivism

Many forensic psychologists believe that psychopathy is a risk factor for sex offender recidivism. Not surprisingly, when forensic psychologists assign a sex offender a high score on a psychopathy test, it increases the risk of extreme legal sanctions such as civil commitment.

But a new study out of Texas found zero correlation between sexual recidivism and psychopathy, as measured by the widely used Psychopathy Checklist (PCL-R). If anything, sex offenders who were arrested for a new sexually violent offense tended to have lower scores on the PCL-R than those who were not rearrested!

Regular blog readers should be familiar with these researchers by now: Dan Murrie, Marcus Boccaccini and crew are the same scholars who informed us of the partisan allegiance phenomenon, in which evaluators working for the government tend to assign far higher PCL-R scores than do those working for the defense.

In their most recent study, they examined PCL-R scores from about 400 sex offenders in Texas who were released from prison and then tracked for anywhere from about two to seven years. They examined not just the total scores on the PCL-R, but also scores on the instrument's two factors, as well as four so-called facet scores. Not one of these seven PCL-R variables was a statistically significant predictor of whether a man would be arrested for a new sex crime.

“Overall, these predictive validity findings were striking because the PCL-R apparently failed to predict the type of outcome (i.e., sexually violent reoffense) for which it was administered in this context,” the authors noted.

Further, in cases in which the PCL-R was administered by more than one evaluator, the study found poor agreement between the two, even though both were working for the government. Consistent with prior research, interrater agreement was higher on Factor 2, which measures antisocial behavior and an impulsive lifestyle, than on Factor 1, which measures the vaguely operationalized personality and emotional dynamics thought to underlie psychopathy.

In an interesting twist, the researchers tried to determine whether some evaluators were more accurate than others at predicting recidivism through PCL-R scores. They identified four highly prolific evaluators; together, these three psychologists and one medical doctor had provided almost two-thirds of the PCL-R scores in the study. Although the PCL-R scores of three of these four evaluators were more likely than other evaluators' scores to correlate with a new arrest for a non-sexual crime, even these evaluators could not produce PCL-R scores that predicted sexual offense recidivism.

Despite the PCL-R’s lack of predictive validity, sex offenders with higher PCL-R scores were more likely than others to be recommended for civil commitment, indicating that the unreliable rating was far from harmless in forensic practice.

The study is: 

Murrie, D. C., Boccaccini, M. T., Caperton, J. and Rufino, K. Field Validity of the Psychopathy Checklist–Revised in Sex Offender Risk Assessment. Psychological Assessment. Click HERE to request a copy from the first author, at the Institute of Law, Psychiatry, and Public Policy.

Of related interest:

June 19, 2011

Violence risk meta-meta: Instrument choice does matter

Despite popularity, psychopathy test and actuarials not superior to other prediction methods 

The past couple of decades have seen an explosion of interest in forensic assessment of risk for future violent and sexual recidivism. Accordingly, evaluators can now choose from an array of more than 120 different risk assessment tools. But should this choice be based on individual preference, or are some instruments clearly superior to others?

Several systematic reviews and metaanalyses have addressed this question, but their conclusions often conflict. In the first systematic review of these reviews (called a “meta-review”), Jay Singh and Seena Fazel of Oxford University found that methodological shortcomings may contribute to the confusion. Problems they identified in the 40 metaanalyses and reviews they studied included authors' failure to adequately describe their study search procedures, failure to check for overlapping samples or publication bias, and failure to investigate the confound of sample heterogeneity.

The Oxford scholars, along with Martin Grann of Sweden's Centre for Violence Prevention, set out to rectify this problem via a more methodologically rigorous meta-review, using optimal data analyses and reporting procedures. For this purpose, they used the Preferred Reporting Items for Systematic Reviews and Metaanalyses, a 27-item checklist designed to enable a transparent and consistent reporting of results.

For their meta-meta (a metaanalysis of the metaanalyses), they collected data from 68 studies involving about 26,000 participants in 13 countries, focusing on the accuracy of the nine most commonly used forensic risk assessment instruments:
  • Psychopathy Checklist (PCL-R)
  • Static-99
  • Historical, Clinical, Risk Management-20 (HCR-20)
  • Violence Risk Appraisal Guide (VRAG)
  • Sexual Violence Risk-20 (SVR-20)
  • Level of Service Inventory (LSI-R)
  • Sex Offender Risk Appraisal Guide (SORAG)
  • Spousal Assault Risk Assessment (SARA)
  • Structured Assessment of Violence Risk in Youth (SAVRY)
Big differences in predictive validity

As it turns out, these widely used instruments vary substantially in predictive accuracy. Performing the best was the SAVRY, a risk assessment instrument designed for use with adolescents. At the bottom were the Level of Service Inventory and the Psychopathy Checklist. This is not too surprising, as the LSI-R is used with a wide variety of general offenders, and the PCL-R was not designed for risk prediction in the first place.



The present metaanalysis would therefore argue against the view of some experts that the PCL- R is unparalleled in its ability to predict future offending.

Statistical method matters: DOR outperforms AUC

The researchers compared several different methods of measuring predictive accuracy. They found that a popular statistic called the Area Under the Curve (AUC) was the weakest. Use of the AUC statistic may help to explain why some metaanalyses were unable to find significant differences among instruments, the authors theorize.

Better methods for comparing instruments’ predictive accuracy include calculating positive and negative predictive values and also using something called the Diagnostic Odds Ratio, or DOR. This is the ratio of the odds of a positive test result in an offender (true positive) relative to the odds of a positive result in a non-offender (false positive). The authors’ summary performance scores pooled results from all four statistical methods.

Actuarials not superior; race also matters

The poor performance of the Psychopathy Checklist (PCL-R) was not the only finding that may surprise some forensic evaluators. The researchers also found no evidence that actuarial tools – such as the widely touted Static-99 – outperform structured clinical judgment methods like the HCR-20 or the SVR-20.

They also found that an offender's race is critical to predictive accuracy. Risk assessment instruments perform best on white offenders, most likely because white offenders predominate in the underlying studies. This is consistent with other research, including a study by Dernevick and colleagues finding that risk assessment instruments are poor at predicting misconduct in terrorists.

Caution is therefore warranted when using any risk assessment tool to predict offending in samples dissimilar to their validation samples, the authors stress.

This systematic review appears to be the most methodologically rigorous such study to date, in a rapidly evolving field. I recommend obtaining both articles (see below for author contact information) and studying them carefully. The stakes are high, and it behooves us to use the instruments that are the most accurate for the specific purpose at hand.

The studies are:

May 26, 2011

PCL-R inventor wringing his hands over forensic misuse

The first part of NPR's series on the psychopathy industry aired today and the transcript is now online (HERE), along with my sidebar essay on the cultural history of psychopathy (HERE). Most fascinating to me is recent efforts by Robert Hare, inventor of the popular Psychopathy Checklist-Revised (PCL-R), to distance himself from growing evidence of its misuse in forensic contexts:
Robert Dixon Jr.,
featured in the NPR story
While Hare remains a strong believer that his test works well for the kind of basic scientific research that it was originally designed for, he and others have begun to wonder if it does as good a job outside the lab.

"I'm very concerned about the inappropriate use of this instrument for purposes that have serious implications for individuals and for society," Hare says. "It shouldn't work that way."

In fact, Hare says, he is so disturbed by some of what he has seen as he has traveled through America training psychologists in use of the PCL-R, that he sometimes has trouble focusing on the way his test could be affecting people's lives.

"I think about this periodically, and I probably try to suppress it," Hare says. "I do disassociate myself from it. I mean, if I thought about every potential use or misuse of the instrument, I probably wouldn't sleep at all."

"Alarming world of globe-trotting experts"

Hare goes even further in a series of interviews with journalist Jon Ronson, author of the new book, The Psychopath Test. Over late-night drinks at hotel bars, he tells the author that he is especially chagrined at the PCL-R’s use by poorly trained and biased evaluators in Sexually Violent Predator (SVP) cases in the United States:
“ ‘I tried to train some of the people who administer it. They were sitting around, twiddling their thumbs, rolling their eyes, doodling, cutting their fingernails – these were people who were going to use it.’

“A Coalinga psychiatrist, Michael Freer, told the Los Angeles Times in 2007 that more than a third of Coalinga ‘individuals’ (as the inmates there are called) had been misdiagnosed as violent predators and would in fact pose no threat to the public if released. ‘They did their time, and suddenly they are picked up again and shipped off to a state hospital for essentially an indeterminate period of time,’ Freer said. ‘To get out they have to demonstrate that they are no longer a risk, which can be a very high standard. So, yeah, they do have grounds to be very upset.’

“In the executive bar, Bob Hare continued. He told me of an alarming world of globe-trotting experts, forensic psychologists, criminal profilers, traveling the planet armed with nothing much more than a Certificate of Attendance, just like the one I had. These people might have influence inside parole hearings, death penalty hearings, serial-killer incident rooms, and on and on. I think he saw his checklist as something pure – innocent as only science can be – but the humans who administered it as masses of weird prejudices and crazy dispositions.”

If Hare’s conscience is really bothering him, he could do more than try to distance himself from miscarriages of justice in interviews with journalists after the fact. He could stop training the legions of government SVP evaluators in the United States, and/or issue a policy statement about the misuse of his instrument in court.

Of course, that would mean a significant loss of revenue. Even Ronson, the author of The Psychopath Test, had to pay 400 pounds (media discount) to gain access to Hare at a 3-day PCL-R training course. And that didn’t include the cost of the 30-page manual, another 361 pounds.



My review of The Psychopath Test at Amazon:

The power to label is intoxicating. That’s what Jon Ronson found after taking a 3-day training that gave him license to diagnose people as psychopaths. Armed with a 40-item checklist, the journalist went gallivanting around the globe, sniffing out prospective psychopaths from convicted murderers to corporate job-slashers and Haitian war criminals. Ronson’s chronicle of his two-year quest for the elusive psychopath is at times whimsical, sometimes laugh-out-loud funny, and always riveting.
The review continues HERE. (As always, if you enjoy it, please click “yes.”)

May 25, 2011

NPR series on psychopathy in court

This Thursday and Friday, NPR is airing a 2-part series about the controversial use of psychopathy in court.

In Part I on All Things Considered, psychology and science reporter Alix Spiegel shows the profound negative consequences of psychopathy on the lives of those so designated. She profiles a California prisoner whose friends and family do not believe he is a psychopath, but who will likely never win parole due to that label.

Part II (on Friday) examines the history of the Psychopathy Checklist (PCL-R) and explores how it became so entrenched in the criminal justice system. Dan Murrie’s research questioning the reliability of the PCL in adversarial contexts is highlighted. Robert Hare also expresses concerns about the way his tool is being used in court, and the corrupting influences of the adversarial system.

The show will be complemented by an online debate, “Does the PCL-R test have value in the criminal justice system?” I am one of the featured experts on the panel, so I hope some of you will visit the website; I will provide a link just as soon as it goes live.

For those of you who would rather tune in once, the show is also airing as a single piece on Friday on the radio program This American Life, and you will be able to download the podcast there as well.

Part I of the 2-part series is HERE; the online debate is HERE.

March 25, 2011

Psychopathy: A Rorschach test for psychologists?

  • Compassion
  • Empathy
  • Impulsivity
  • Excitement-seeking
What do these personality traits have in common?

If you are high on any or all of them, you may be less likely to rate other people as psychopathic on the Psychopathy Checklist (PCL-R).

The PCL-R is the most widely used measure of psychopathy in the world. But in real-world forensic settings, scores vary widely depending upon which side retained the evaluator. This finding is called the "partisan allegiance" effect.

In a new twist, these same researchers that brought you partisan allegiance have found that an evaluator's personality may impact her judgments of psychopathy. Evaluators low on compassion and thrill-seeking as measured by a widely used personality test, the NEO Personality Inventory-Revised, are more likely than others to rate criminals as psychopathic.

That’s ironic, because according to the theory of psychopathy, it's supposed to be the psychopath -- not the psychologist -- who has a deficit in empathy.

The exploratory study, forthcoming in the journal Assessment, was based on a small sample of 22 individuals who were given nine hours of training by a clinical psychologist with substantial research and forensic practice experience with the PCL-R. "The daylong session was an attempt to replicate typical PCL-R training procedures," the study authors explain.

The researchers emphasize that their findings are preliminary and need to be replicated and extended. But if they hold up, they have intriguing implications not only for the psychopathy measure but also for other psychological tests with elements of subjectivity in scoring or interpretation.

The study did not examine the accuracy of the low versus high scorers. But if low-scoring evaluators are more empathetic, this implies that they may be more accurate in interpersonal assessment contexts.  

Subterranean class conflict?

Future research might examine class background, race and philosophical  beliefs to see if these influence scoring of the Psychopathy Checklist. In my informal observations, professionals who look for psychopaths under every rock tend to lack understanding of, or empathy for, those on the bottom.

Here's how that looks in practice:

The upper middle-class professional walks into the evaluation room, oblivious to the blinders and unconscious biases she brings to the table. Her subject, in contrast, is far from oblivious. With his more acute empathetic skills, the lower-class or minority individual accurately reads the professional's bias against him, which she transmits through nonverbal and other deniable cues. He also realizes that she holds all the power, and that her judgments will affect his future in very tangible ways.

He reacts with defensiveness, suspicion, or muted hostility -- especially if she is working for "the other side." But not recognizing his reaction as part of an interactional dance that she herself set in motion, the evaluator interprets his stance as evidence of intrinsic personality defect. She may see him as glib, superficially charming, conning, or manipulative -- all facets of Factor 1 (the personality dimension) on the Psychopathy Checklist.

In this interaction, all of the power belongs to the person who gets to do the labeling. Scoring and labeling the offender becomes a circular process through which the evaluator -- especially when primed by adversarial allegiance -- can  project her own class- or race-based prejudices, distancing herself from the evil other, while at the same time denying complicity. An obfuscating tautology is proffered as a simple explanation for complex and multi-determined antisocial acts.

There is more to the construct of psychopathy, of course. I focus here on its potential subjectivity because this is a facet that proponents rarely acknowledge, especially in public. Forensic experts owe a duty to explain the subjectivity of the PCL-R when it is introduced in court, where the label "psychopath" can be the kiss of death. When labeled as psychopaths:
  • Juveniles are more likely to face harsh punishment in adult court.
  • Sex offenders are more likely to be preventively detained.
  • Capital murder defendants are more likely to receive the death penalty.
So, the next time a promising young student proposes to study psychopathy or "the criminal mind," you might give her a gentle nudge in a more fruitful direction: Rather than treading this tired old path, she might contribute more to the field by studying the psyches of professionals who assign such diagnostic labels in the first place. 

January 21, 2011

How competent are the competency evaluators?

Largest real-world study finds modest agreement among independent alienists

A bad forensic report by a bad evaluator sets bad events in motion.

True story: A judge ordered a robbery suspect to undergo evaluation. A drive-by psychologist opined that the defendant was incompetent to stand trial due to schizophrenia and attention deficit/hyperactivity disorder (ADHD). The judge rubber-stamped the evaluator's opinion. The defendant was shipped off to the dysfunctional state hospital system for competency restoration treatment. There, the psychologist's diagnoses were rubber-stamped. The unruly defendant was shot full of powerful antipsychotics, given a few months of bus therapy, and proclaimed competent. The defendant had never been psychotic in the first place. Years later, he remained just as mentally retarded as ever.

"Penny-wise, pound-foolish" is the expression that comes to mind. The courts try to save money by appointing only one psychologist per case, and by paying a ludicrously small sum that encourages shoddy practices. But cleaning up the resultant messes is costly, inefficient, and fundamentally unfair.

Competency evaluations are the bread and butter of forensic work. An estimated 60,000 defendants per year -- roughly 5% of the total -- are evaluated to see whether they understand their legal situations and can rationally assist their lawyers in their defense. But for all of the importance of accurate assessments, both to a smoothly running court system and to the rights of the mentally ill to a fair trial, surprisingly little is known about the real-world accuracy of forensic evaluators.

In the case I just outlined, the judge viewed psychologists and psychiatrists as equal and interchangeable, all inherently reliable and trustworthy. At the other extreme, some believe forensic opinions are as random as a chimp with a typewriter.

Hawaii: Exemplar or exception?

Only one U.S. state squarely addresses the problem of reliability in competency evaluations. In the Aloha State, when a doubt is raised as to a defendant's competency, three separate evaluators must conduct independent evaluations. One evaluator is a state employee; the other two are independent. One must be a psychiatrist. By law, the three cannot talk with each other about the case.

This makes Hawaii the perfect setting to examine the real-world reliability of competency evaluators. In a study just accepted for publication in Law and Human Behavior, three investigators took advantage of this opportunity to conduct the largest naturalistic study ever of evaluators' agreement about competency to stand trial.

It should not be a surprise that Daniel Murrie and Marcus Boccaccini are two of the investigators. Not the types to run Psych 101 undergrads through artificial lab experiments, these two are committed to examining forensic practice in the courtroom trenches. I've blogged about their previous work exposing "partisan allegiance" effects in the real-world application of the Psychopathy Checklist (PCL-R). For the current innovative study, they teamed up with W. Neil Gowensmith of the Hawaii courts' forensic services unit.

Examining 729 reports authored by 35 evaluators, they found that all three evaluators agreed in just under three out of four -- or 71 percent -- of initial competency referrals. Agreement was a bit lower -- 61 percent -- in cases where defendants were being reevaluated after undergoing competency restoration treatment.

Consistent with the results of a hot-off-the-press meta-analysis of 50 years of competency research, evaluators believed that the broad majority of defendants referred for evaluation, about 73 percent, were competent to stand trial. This figure was somewhat lower for defendants being reevaluated after an initial finding of competency, with evaluators opining competence in about half of such restoration cases.

Why do evaluators differ?

As far as why agreement is not higher, the study raised more questions than it answered. The researchers sifted through the data looking for patterns, but none jumped out. Evaluators did not lean one way or the other by discipline (psychologist vs. psychiatrist) or by employer (state versus private practice). Defendant demographics were not explanatory. Nor were evaluator disagreements about diagnosis.

It would be interesting to conduct qualitative analyses of the 216 cases in this study to see whether those in which evaluators differed were more complex and ambiguous than the others. I suspect that to be the case.

Competency is nebulous. It exists along a continuum, so there is no precise cut point at which a defendant is automatically "competent" or "incompetent" to go forward with his legal case. Thus, evaluator agreement will never be perfect, nor -- necessarily -- should it be.

How did the judges rule?

One of the more intriguing aspects of the study was its exposition of how judges ruled after being presented with three reports. Not surprisingly, when evaluators were unanimous or split 2-1, the judges tended to go with the majority. But unlike the judge in the vignette I described earlier, many Hawaiian judges were independent thinkers who did not just rubber-stamp the evaluators' opinions.

When they disagreed with the opinions of the court-appointed psychologists and psychiatrists, it was typically to find a defendant incompetent. In fact, in a few cases the judges found defendants to be incompetent even when all three evaluators believed a defendant was competent. In this way, they elevated defendants' due-process rights over prosecutorial efficiency. But maybe that's just Hawaii.

Moving forward

I found the results somewhat encouraging. When not subjected to partisan allegiance pressures, forensic practitioners agreed about three-fourths of the time about whether a defendant was competent to stand trial or not.

Still, if these results are generalizable, it means evaluators will disagree in about two or three cases out of every ten. So in jurisdictions that appoint only a single evaluator, the researchers point out, many judges may be unwittingly rubber-stamping an idiosyncratic -- and even patently incorrect -- opinion:
[T]o the extent that there is a factually correct answer to the question of whether or not a defendant is competent to stand trial, relying on one evaluator increases the likelihood that the court reaches an incorrect decision (by following an incorrect single opinion that would have been revealed as a minority opinion if other evaluations were available). In some instances, this may result in delaying a trial while a defendant is unnecessarily hospitalized. In other instances this may result in a defendant proceeding to trial when additional evaluator(s) would have opined the defendant was unable to participate meaningfully in that trial….

The justice system needs to continue to wrestle with how to handle these competing demands -- efficient use of resources versus fair consideration of defendants' right to due process.
Murrie and Boccaccini are on a roll. Let's hope they keep churning out this ground-breaking line of research, examining the real-world vagaries of forensic practice, and that others climb down from the ivory towers and jump on their bandwagon.

As they note, "naturalistic studies of field reliability are an essential first step in gauging wide-scale quality across all manner of forensic practice and targeting areas for improvement."

September 3, 2010

Metaanalysis debunks psychopathy-violence link

No clear winner among violence risk tools

If you are looking for the best tool to assess someone's risk for violence, the array may seem confusing. Lots of acronyms, lots of statistical data about AUC's (Areas Under the Curve) and the like. What do do?

No worries. As it turns out, they're pretty much interchangeable. That is the bottom-line finding of a groundbreaking metaanalytic study in the APA journal Psychological Bulletin by three academic researchers from the United Kingdom.

The University of Nottingham researchers used sophisticated statistical tools to meta-analyze multiple studies on the accuracy of nine leading violence risk assessment tools. All nine turned out to have similarly moderate predictive accuracy, with none clearly leading the pack. And none -- the scholars warned -- were sufficiently accurate for courts to rely upon them as a primary basis for decision-making in forensic cases requiring "a high level of predictive accuracy, such as preventive detention."

Widely touted PCL-R's "Factor 1" a bust

In a result with potentially momentous implications for forensic practitioners, the researchers found that Factor 1 of the Psychopathy Checklist-Revised (PCL-R) does not predict violence. As you know, Factor 1 purports to measure the core constellation of a psychopathic personality (superficial charm, manipulativeness, lack of empathy, etc.). When introduced in court, evidence of psychopathy has an enormously prejudicial impact on criminal offenders.

But, the PCL-R's much-ballyhooed ability to predict certain types of violence owes only to the instrument's second factor, according to the metaanalysis by researchers Min Yang, Steve Wong, and Jeremy Coid. And that's no surprise. After all, Factor 2 measures the criminogenic factors (criminality, irresponsibility, impulsivity, history of delinquency, etc.) that even a fifth-grader knows are bad signs for a future of law-abiding citizenship.

In my experience, the Factor 1 items -- the ones purporting to measure an underlying personality profile -- are the ones more likely to be inflated by some evaluators. That's because many of these items are pretty subjective. Glib? Superficially charming? If you don't like a guy -- and/or he doesn't like you -- you are more likely to rate these negative items as present. That's one of my hypotheses for the large evaluator differences and partisan allegiance effects found with the PCL-R in forensic practice.

Cumulatively, the emerging PCL-R findings beg the question:

Why introduce the Psychopathy Checklist in court if other violence risk tools work just as well, without the implicitly prejudicial effect of labeling someone as a "psychopath"?

Psychopathy evidence skyrocketing in juvenile cases

Despite (or perhaps because of, in some cases) its prejudicial impact, the construct of psychopathy is increasingly being introduced in court cases involving juveniles. It is often used to infer that a youth should get a longer sentence because he or she is dangerous and not amenable to treatment.

Skyrocketing use of psychopathy evidence in juvenile cases
Source: Viljoen et al, Psychology, Public Policy, and Law (2010)


The first systematic review, published in the current issue of Psychology, Public Policy, and Law, found the use of psychopathy evidence against juveniles skyrocketing in both Canada and the United States. Psychopathy evidence is typically introduced when juveniles are being sentenced as adults and in sex offender commitment cases. It is also introduced in a variety of other cases, including ones involving disputed confessions, competency to stand trial, and criminal responsibility, report authors Jodi Viljoen, Emily MacDougall, Nathalie Gagnon, and Kevin Douglas.

In one egregious case showing how judges may improperly use evidence of psychopathy, a Canadian judge reasoned that a youth's "psychopathic device [sic] score" showed that under his "shy and unassuming" exterior lurked "a monster" that "at any time ... may well come alive." As a result, the judge sentenced this minor to an adult penitentiary.

Such inferences of unremitting danger and untreatability are improper. A large proportion of youths measured high in psychopathy score lower on psychopathy instruments once they mature. And so-called psychopathic youths are far from untreatable; in one recent study by Michael Caldwell and colleagues, after intensive treatment youths who scored high in psychopathy were actually less likely to recidivate than a comparison group in a juvenile jail.

"[T]he introduction of psychopathy evidence into juvenile forensic contexts has been somewhat rushed and premature at times," the authors conclude.

Have risk prediction tools hit the ceiling?

Researchers have been toiling for almost five decades to perfect risk prediction tools. Unfortunately, they keep running into an insurmountable obstacle: A large proportion of violence is situational. It's affected by environmental context, not just qualities internal to the individual. And not only that, but it is always extremely hard to predict a rare event.

Based on their metaanalytic findings, the UK researchers say maybe it's time to stop searching for the holy grail. Maybe we've reached the ceiling of predictive efficacy.
Violent behavior is the result of the individual interacting with the immediate environment. Although it may be possible to improve on our understanding and predicting what an individual may do in hypothetical situations, it will be much more difficult to predict the situation that an individual actually encounters in the open community. Even predicting violence within an institutional environment is difficult, where the assessor has much more information about that environment.
Instead, they say, it is time to turn our attentions to interventions that can reduce risk:
Building a better model of violence prediction should not be the sole aim of risk prediction research, which is just one link in the risk assessment-prediction-management triad that aims to achieve violence reduction and improved mental health…The risk, need and responsivity principles derived from the theory of the psychology of criminal conduct provide a useful theoretical framework for risk reduction intervention. Appropriate risk assessment can identify high-risk individuals in need of more intensive management and intervention…. Using tools with dynamic risk predictors to assess risk can identify appropriate changeable treatment targets linked to violence.
The studies included in the metaanalysis were from six countries: the United Kingdom (11), Canada (9), Sweden (3), the United States (3), Holland (2), and Germany (1). The instruments included the PCL-R, the PCL:SV, the HCR-20, the VRAG, the OGRS, the RM2000V, the LSI/LSI-R, the GSIR, and the VRS, as well as seven instrument ubscales: PCL-R Factor 1 and Factor 2, the 10-item Historical subscale, the five-item Clinical subscale, and the five-item Risk Management subscale of the HCR-20; and the Static and Dynamic scales of the VRS.

Dr. Wong, former Research Director at the Regional Psychiatric Centre in Saskatoon, Saskatchewan, studied psychopathy and high-risk offenders for 25 years and developed the Violent Risk Scale and the Violence Risk Scale-sexual offender version before becoming a special professor at the Institute of Mental Health at the University of Nottingham. Dr. Yang is a professor of medical statistics with the Faculty of Medicine and Health Sciences at the University of Nottingham. And Dr. Coid, Director of the Forensic Psychiatry Research Unit, is principal investigator of the UK Home Office’s Prisoner Cohort Study and also studies the epidemiology of violent and criminal behavior at the population level.

The articles reported on here are: Of related interest: