Showing posts sorted by date for query PCL-R. Sort by relevance Show all posts
Showing posts sorted by date for query PCL-R. Sort by relevance Show all posts

February 5, 2013

Texas SVP jurors ignoring actuarial risk scores

Expert witness for defense makes a (small) difference, study finds

The fiery debates surrounding the validity of actuarial tools to predict violence risk begs the question: How much influence do these instruments really have on legal decision-makers? The answer, at least when it comes to jurors in Sexually Violent Predator trials in Texas:

Not much.

"Despite great academic emphasis on risk measures - and ongoing debates about the value, accuracy, and utility of risk-measure scores reported in SVP hearings - our findings suggest these risk measure scores may have little impact on jurors in actual SVP hearings."

The researchers surveyed 299 jurors at the end of 26 sexually violent predator trials. Unfortunately, they could not directly measure the relationship between risk scores and civil commitment decisions because, this being Texas, juries slam-dunked 25 out of 26 sex offenders, hanging in only one case (which ultimately ended in commitment after a retrial).  

Instead of the ultimate legal outcome, the researchers had to rely on proxy outcome measures, including jurors' ratings of how dangerous an individual was (specifically, how likely he would be to commit a new sex offense within one year of release), and their assessment of how difficult it was to make a decision in their case.

There was no evidence that jurors' assessments of risk or decision difficulty varied based on respondents' scores on risk assessment tools, which in each case included the Static-99, MnSOST-R and the PCL-R. This finding, by the prolific team of Marcus Boccaccini, Daniel Murrie and colleagues, extends into the real world prior mock trial evidence that jurors in capital cases and other legal proceedings involving psychology experts are more heavily influenced by clinical than actuarial testimony.

What did make a difference to jurors was whether the defense called at least one witness, and in particular an expert witness. Overall, there was a huge imbalance in expert testimony, with almost all of the trials featuring two state experts, but only seven of 26 including even one expert called by the defense.

"Skepticism effect"

The introduction of a defense expert produced a "skepticism effect," the researchers found, in which jurors became more skeptical of experts' ability to predict future offending. However, jurors' lower risk ratings in these cases could also have been due to real differences in the cases. In SVP cases involving legitimately dangerous sex offenders, defense attorneys often have trouble finding experts willing to testify. In other words, the researchers note, "the reduced ratings of perceived risk associated with the presence of a defense expert may be due to nonrandom selection … as opposed to these defense experts' influencing jurors."

A back story here pertains to the jury pool in the Texas county in which civil commitment trials are held. All SVP trials take place in Montgomery County, a "very white community," an attorney there told me. A special e-juror selection process for SVP jurors whitens the jury pool even more, disproportionately eliminating Hispanics and African Americans. Meanwhile, many of those being referred for civil commitment are racial minorities. The potentially Unconstitutional race discrepancy is the basis for one of many current legal challenges to the SVP system in Texas.

Once a petition for civil commitment as a sexually violent predator is filed in Texas, the outcome is a fait accompli. Since the inception of the state's SVP law, only one jury has unanimously voted against civil commitment. Almost 300 men have been committed, and not a single one has been released.

Overall, the broad majority of jurors in the 26 SVP trials were of the opinion that respondents were likely to reoffend in the next year. Based on this heightened perception of risk, the researchers hypothesize that jurors may have found precise risk assessment ratings irrelevant because any risk was enough to justify civil commitment.

In a previous survey of Texas jurors, more than half reported that even a 1 percent chance of recidivism was enough to qualify a sex offender as dangerous. To be civilly committed in Texas, a sex offender must be found "likely" to reoffend, but the state's courts have not clarified what that term means.  

Risk scores could also be irrelevant to jurors motivated more by a desire for retribution than a genuine wish to protect the public, the researchers pointed out. "Although SVP laws are ostensibly designed to provide treatment and protect the public, experimental research suggests that many mock jurors make civil commitment decisions based more on retributive motives - that is, the desire to punish sexual offenses—than the utilitarian goal of protecting the public…. Jurors who adopt this mindset may spend little time thinking about risk-measure scores."

All this is not to say that actuarial scores are irrelevant. They are highly influential in the decisions that take place leading up to an SVP trial, including administrative referrals for full evaluations, the opinions of the evaluators themselves as to whether an offender meets civil commitment criteria, and decisions by prosecutors as to which cases to select for trial.

"But the influence of risk scores appears to end at the point when laypersons make decisions about civilly committing a select subgroup of sexual offenders," the researchers noted.

Bottom line: Once a petition for civil commitment as a sexually violent predator is filed in Texas, it's the end of the line. The juries are ultra-punitive, and the deck is stacked, with government experts outnumbering experts called by the defense in every case. It remains unclear to what extent these results might generalize to SVP proceedings in other states with less conservative jury pools and/or more balanced proceedings.

  • The study, "Do Scores From Risk Measures Matter to Jurors?" by Marcus Boccaccini, Darrel Turner, Craig Henderson and Caroline Chevalier of Sam Houston State University and Daniel Murrie of the University of Virginia, is slated for publication in an upcoming issue of Psychology, Public Policy, and Law. To request a copy, email the lead researcher (HERE).

October 4, 2012

Long-awaited HCR-20 update to premiere in Scotland

The long-awaited international launch of the third version of the popular HCR-20 violence risk assessment instrument has been announced for next April in Edinburgh, Scotland.

The HCR-20 is an evidence-based tool using the structured professional judgment method, an alternative to the actuarial method that predicts violence at least as well while giving a more nuanced and individualized understanding. It has been evaluated in 32 different countries and translated into 18 languages.

A lot has changed in the world of risk prediction since the second edition premiered 15 years ago. Perhaps the major change in the third edition is the elimination of the need to incorporate a Psychopathy Checklist (PCL-R) score; research determined that this did not add to the instrument's predictive validity. Additionally, like the sister instrument for sex offender risk assessment, the RSVP, the HCR:V3 will focus more heavily on formulating plans to manage and reduce a person's risk, rather than merely predicting violence.

The revision process took four years, with beta testing in England, Holland, Sweden and Germany. Initial reports show very high correlations with the second edition of the HCR-20, excellent interrater reliability, and promising validity as a violence prediction tool.

The HCR:V3 will be launched at a one-day conference jointly organized by The Royal Society of Edinburgh and Violence Risk Assessment Training. Developers Christopher Webster, Stephen Hart and Kevin Douglas will be on hand to describe the research on the new instrument and its utility in violence risk assessment.

More information on the April 15, 2013 training conference is available HERE. A Webinar PowerPoint on the revision process is HERE.

August 2, 2012

Violence risk instruments overpredicting danger

Tools better at screening for low risk than pinpointing high risk 


The team of Seena Fazel and Jay Singh are at it again, bringing us yet another gigantic review of studies on the accuracy of the most widely used instruments for assessing risk of violence and sexual recidivism.


This time, the prolific researchers -- joined by UK statistician Helen Doll and Swedish professor Martin Grann -- report on a total of 73 research samples comprising 24,847 people from 13 countries. Cumulatively, the samples had a high base rate of reoffense, with almost one in four reoffending over an average of about four years.

Bottom line: Risk assessment instruments are fairly good at identifying low risk individuals, but their high rates of false positives -- people falsely flagged as recidivists -- make them inappropriate “as sole determinants of detention, sentencing, and release.”

In all, about four out of ten of those individuals judged to be at moderate to high risk of future violence went on to violently offend. Prediction of sexual reoffense was even poorer, with less than one out of four of those judged to be at moderate to high risk going on to sexually offend. In samples with lower base rates, the researchers pointed out, predictive accuracy will be even poorer.

What that means, in practical terms, is that to stop one person who will go on to become violent again in the future, society must lock up at minimum one person who will NOT; for sex offenders, at least three non-recidivists must be detained for every recidivist. This, of course, is problematic from a human rights standpoint. 

Another key finding that goes against conventional wisdom was that actuarial instruments that focus on historical risk factors perform no better than tools based on clinical judgment, a finding contrary to some previous review.

The researchers included the nine most commonly used risk assessment tools, out of the many dozens that have now been developed around the world:
  • Level of Service Inventory-Revised (LSI-R) 
  • Psychopathy Checklist-Revised (PCL-R) 
  • Sex Offender Risk Appraisal Guide (SORAG) 
  • Static-99 
  • Violence Risk Appraisal Guide (VRAG) 
  • Historical, Clinical, Risk management-20 (HCR-20) 
  • Sexual Violence Risk-20 (SVR-20) 
  • Spousal Assault Risk Assessment (SARA) 
  • Structured Assessment of Violence Risk in Youth (SAVRY) 
Team leader Fazel, of Oxford University, and colleagues stressed several key implications of their findings:
One implication of these findings is that, even after 30 years of development, the view that violence, sexual, or criminal risk can be predicted in most cases is not evidence based. This message is important for the general public, media, and some administrations who may have unrealistic expectations of risk prediction for clinicians. 

A second and related implication is that these tools are not sufficient on their own for the purposes of risk assessment. In some criminal justice systems, expert testimony commonly uses scores from these instruments in a simplistic way to estimate an individual’s risk of serious repeat offending. However, our review suggests that risk assessment tools in their current form can only be used to roughly classify individuals at the group level, and not to safely determine criminal prognosis in an individual case. 

Finally, our review suggests that these instruments should be used differently. Since they had higher negative predictive values, one potential approach would be to use them to screen out low risk individuals. Researchers and policy makers could use the number safely discharged to determine the potential screening use of any particular tool, although its use could be limited for clinicians depending on the immediate and service consequences of false positives. 

A further caveat is that specificities were not high -- therefore, although the decision maker can be confident that a person is truly low risk if screened out, when someone fails to be screened out as low risk, doctors cannot be certain that this person is not low risk. In other words, many individuals assessed as being at moderate or high risk could be, in fact, low risk. 

My blog post on these researchers' previous meta-analytic study, Violence risk meta-meta: Instrument choice does matter, is HERE.

May 29, 2012

SVP risk tools show 'disappointing' reliability in real-world use

Rater agreement on three instruments commonly used  to assess sex offenders' risk of recidivism is much lower in practice than reported in the tools' manuals, according to a new study out of Florida.

Faring most poorly was the Psychopathy Checklist (PCL-R). Correlations of scores between two evaluators hired by the same agency were in the low range. On average, psychologists differed by five points on the instrument, which has a score range of of zero to 40. In one case, two evaluators were apart by a whopping 24 points!

Agreement among evaluators was only moderate on the Static-99 and the MnSOST-R, two actuarial risk assessment instruments for which scoring is relatively more straightforward.

The study, published in the respected journal Psychological Assessment, was a collaboration between scholars from the Department of Mental Health Law and Policy at the University of South Florida and researchers with the Florida Department of Children and Families. It utilized archived records culled from the almost 35,000 individuals screened for possible Sexually Violent Predators (SVP) civil commitment in Florida between 1999 and 2009. The researchers located 315 cases in which the same individual was evaluated by separate clinicians who each administered both the PCL-R and at least one of the two actuarial measures within a short enough time frame to enable direct scoring comparisons.

It would be a mistake to lean too heavily on the results of a single isolated study. But the present study adds to a burgeoning body from several groups of independent researchers, all pointing to troubling problems with the accuracy of instruments designed to forecast risk of recidivism among sex offenders.

Related study: Psychopathy and sexual deviance not predictive

Collectively, the research has been especially critical of the ability of the highly prejudicial construct of psychopathy to add meaningfully to risk prediction in this high-stakes arena. Indeed, just this week another study has come out indicating that neither psychopathy scores nor sexual deviance measures improve on the accuracy provided by an actuarial instrument alone.

An especially interesting finding of that Canadian study is that reoffense rates were still below 12 percent over a 6-year followup period for even the most high-risk offenders -- those with high risk ratings on the Static-99R plus high levels of psychopathy and sexual deviance (as measured by phallometric testing). This makes it inappropriate to inflate risk estimates over and above those derived from Static-99R scores alone, the authors caution.

Item-level analysis finds varying rates of accuracy

A unique contribution of the Florida study is its analysis of the relative accuracy of every single item in each of the three instruments studied. Handy tables allow a forensic practitioner to see which items have the poorest reliability, meaning they should be viewed skeptically by forensic decision-makers.

For example, take the MnSOST-R, a now-defunct instrument with a score range of –14 to 31 points. The total gap between evaluators was as wide as 19 points; the items with the greatest variability in scoring were those pertaining to offenders' functioning during incarceration, such as participation in treatment.

Meanwhile, the weak performance of the Psychopathy Checklist owes much to the items on its so-called “Factor 1,” which attempt to measure the personality style of the psychopath. As I've discussed before, rating someone as “glib,” “callous” or “shallow” is a highly subjective enterprise that opens the door to a veritable avalanche of personal bias.

Piggy-backing off a recommendation by John Edens and colleagues, the Florida team suggests that the prejudicial deployment of the Psychopathy Checklist may be superfluous, in that scores on Factor 2 alone (the items reflecting a chronic criminal lifestyle) are more predictive of future violence or sexual recidivism.

Next up, we need to identify the causes of the poor interrater reliability for forensic risk prediction instruments in real-world settings. Is it due to inadequate training, differing clinical skills, variable access to collateral data, intentional or unintentional bias on the part of examiners, adversarial allegiance effects (not a factor in the present study, since both evaluators were appointed by the same agency), or some combination?

In the meantime, the fact that two evaluators working on the same side cannot reliably arrive at the same risk rating for any particular individual should certainly raise our skepticism about the validity of risk prediction based on these instruments.

The studies are:

Reliability of Risk Assessment Measures Used in Sexually Violent Predator Proceedings. Cailey Miller, Eva Kimonis, Randy Otto, Suzonne Kline and Adam Wasserman. Psychological Assessment. Published online 7 May 2012. Advance online publication. Click HERE to contact the authors.

Does Consideration of Psychopathy and Sexual Deviance Add to the Predictive Validity of the Static-99R? Jan Looman, Nicola A. C. Morphett and Jeff Abracen. International Journal of Offender Therapy and Comparative Criminology. Published online 28 May 2012. Click HERE to contact the authors.


Related blog posts: 

October 30, 2011

Study: Psychopathy score fails to predict sexual recidivism

Many forensic psychologists believe that psychopathy is a risk factor for sex offender recidivism. Not surprisingly, when forensic psychologists assign a sex offender a high score on a psychopathy test, it increases the risk of extreme legal sanctions such as civil commitment.

But a new study out of Texas found zero correlation between sexual recidivism and psychopathy, as measured by the widely used Psychopathy Checklist (PCL-R). If anything, sex offenders who were arrested for a new sexually violent offense tended to have lower scores on the PCL-R than those who were not rearrested!

Regular blog readers should be familiar with these researchers by now: Dan Murrie, Marcus Boccaccini and crew are the same scholars who informed us of the partisan allegiance phenomenon, in which evaluators working for the government tend to assign far higher PCL-R scores than do those working for the defense.

In their most recent study, they examined PCL-R scores from about 400 sex offenders in Texas who were released from prison and then tracked for anywhere from about two to seven years. They examined not just the total scores on the PCL-R, but also scores on the instrument's two factors, as well as four so-called facet scores. Not one of these seven PCL-R variables was a statistically significant predictor of whether a man would be arrested for a new sex crime.

“Overall, these predictive validity findings were striking because the PCL-R apparently failed to predict the type of outcome (i.e., sexually violent reoffense) for which it was administered in this context,” the authors noted.

Further, in cases in which the PCL-R was administered by more than one evaluator, the study found poor agreement between the two, even though both were working for the government. Consistent with prior research, interrater agreement was higher on Factor 2, which measures antisocial behavior and an impulsive lifestyle, than on Factor 1, which measures the vaguely operationalized personality and emotional dynamics thought to underlie psychopathy.

In an interesting twist, the researchers tried to determine whether some evaluators were more accurate than others at predicting recidivism through PCL-R scores. They identified four highly prolific evaluators; together, these three psychologists and one medical doctor had provided almost two-thirds of the PCL-R scores in the study. Although the PCL-R scores of three of these four evaluators were more likely than other evaluators' scores to correlate with a new arrest for a non-sexual crime, even these evaluators could not produce PCL-R scores that predicted sexual offense recidivism.

Despite the PCL-R’s lack of predictive validity, sex offenders with higher PCL-R scores were more likely than others to be recommended for civil commitment, indicating that the unreliable rating was far from harmless in forensic practice.

The study is: 

Murrie, D. C., Boccaccini, M. T., Caperton, J. and Rufino, K. Field Validity of the Psychopathy Checklist–Revised in Sex Offender Risk Assessment. Psychological Assessment. Click HERE to request a copy from the first author, at the Institute of Law, Psychiatry, and Public Policy.

Of related interest:

June 19, 2011

Violence risk meta-meta: Instrument choice does matter

Despite popularity, psychopathy test and actuarials not superior to other prediction methods 

The past couple of decades have seen an explosion of interest in forensic assessment of risk for future violent and sexual recidivism. Accordingly, evaluators can now choose from an array of more than 120 different risk assessment tools. But should this choice be based on individual preference, or are some instruments clearly superior to others?

Several systematic reviews and metaanalyses have addressed this question, but their conclusions often conflict. In the first systematic review of these reviews (called a “meta-review”), Jay Singh and Seena Fazel of Oxford University found that methodological shortcomings may contribute to the confusion. Problems they identified in the 40 metaanalyses and reviews they studied included authors' failure to adequately describe their study search procedures, failure to check for overlapping samples or publication bias, and failure to investigate the confound of sample heterogeneity.

The Oxford scholars, along with Martin Grann of Sweden's Centre for Violence Prevention, set out to rectify this problem via a more methodologically rigorous meta-review, using optimal data analyses and reporting procedures. For this purpose, they used the Preferred Reporting Items for Systematic Reviews and Metaanalyses, a 27-item checklist designed to enable a transparent and consistent reporting of results.

For their meta-meta (a metaanalysis of the metaanalyses), they collected data from 68 studies involving about 26,000 participants in 13 countries, focusing on the accuracy of the nine most commonly used forensic risk assessment instruments:
  • Psychopathy Checklist (PCL-R)
  • Static-99
  • Historical, Clinical, Risk Management-20 (HCR-20)
  • Violence Risk Appraisal Guide (VRAG)
  • Sexual Violence Risk-20 (SVR-20)
  • Level of Service Inventory (LSI-R)
  • Sex Offender Risk Appraisal Guide (SORAG)
  • Spousal Assault Risk Assessment (SARA)
  • Structured Assessment of Violence Risk in Youth (SAVRY)
Big differences in predictive validity

As it turns out, these widely used instruments vary substantially in predictive accuracy. Performing the best was the SAVRY, a risk assessment instrument designed for use with adolescents. At the bottom were the Level of Service Inventory and the Psychopathy Checklist. This is not too surprising, as the LSI-R is used with a wide variety of general offenders, and the PCL-R was not designed for risk prediction in the first place.



The present metaanalysis would therefore argue against the view of some experts that the PCL- R is unparalleled in its ability to predict future offending.

Statistical method matters: DOR outperforms AUC

The researchers compared several different methods of measuring predictive accuracy. They found that a popular statistic called the Area Under the Curve (AUC) was the weakest. Use of the AUC statistic may help to explain why some metaanalyses were unable to find significant differences among instruments, the authors theorize.

Better methods for comparing instruments’ predictive accuracy include calculating positive and negative predictive values and also using something called the Diagnostic Odds Ratio, or DOR. This is the ratio of the odds of a positive test result in an offender (true positive) relative to the odds of a positive result in a non-offender (false positive). The authors’ summary performance scores pooled results from all four statistical methods.

Actuarials not superior; race also matters

The poor performance of the Psychopathy Checklist (PCL-R) was not the only finding that may surprise some forensic evaluators. The researchers also found no evidence that actuarial tools – such as the widely touted Static-99 – outperform structured clinical judgment methods like the HCR-20 or the SVR-20.

They also found that an offender's race is critical to predictive accuracy. Risk assessment instruments perform best on white offenders, most likely because white offenders predominate in the underlying studies. This is consistent with other research, including a study by Dernevick and colleagues finding that risk assessment instruments are poor at predicting misconduct in terrorists.

Caution is therefore warranted when using any risk assessment tool to predict offending in samples dissimilar to their validation samples, the authors stress.

This systematic review appears to be the most methodologically rigorous such study to date, in a rapidly evolving field. I recommend obtaining both articles (see below for author contact information) and studying them carefully. The stakes are high, and it behooves us to use the instruments that are the most accurate for the specific purpose at hand.

The studies are:

May 26, 2011

PCL-R inventor wringing his hands over forensic misuse

The first part of NPR's series on the psychopathy industry aired today and the transcript is now online (HERE), along with my sidebar essay on the cultural history of psychopathy (HERE). Most fascinating to me is recent efforts by Robert Hare, inventor of the popular Psychopathy Checklist-Revised (PCL-R), to distance himself from growing evidence of its misuse in forensic contexts:
Robert Dixon Jr.,
featured in the NPR story
While Hare remains a strong believer that his test works well for the kind of basic scientific research that it was originally designed for, he and others have begun to wonder if it does as good a job outside the lab.

"I'm very concerned about the inappropriate use of this instrument for purposes that have serious implications for individuals and for society," Hare says. "It shouldn't work that way."

In fact, Hare says, he is so disturbed by some of what he has seen as he has traveled through America training psychologists in use of the PCL-R, that he sometimes has trouble focusing on the way his test could be affecting people's lives.

"I think about this periodically, and I probably try to suppress it," Hare says. "I do disassociate myself from it. I mean, if I thought about every potential use or misuse of the instrument, I probably wouldn't sleep at all."

"Alarming world of globe-trotting experts"

Hare goes even further in a series of interviews with journalist Jon Ronson, author of the new book, The Psychopath Test. Over late-night drinks at hotel bars, he tells the author that he is especially chagrined at the PCL-R’s use by poorly trained and biased evaluators in Sexually Violent Predator (SVP) cases in the United States:
“ ‘I tried to train some of the people who administer it. They were sitting around, twiddling their thumbs, rolling their eyes, doodling, cutting their fingernails – these were people who were going to use it.’

“A Coalinga psychiatrist, Michael Freer, told the Los Angeles Times in 2007 that more than a third of Coalinga ‘individuals’ (as the inmates there are called) had been misdiagnosed as violent predators and would in fact pose no threat to the public if released. ‘They did their time, and suddenly they are picked up again and shipped off to a state hospital for essentially an indeterminate period of time,’ Freer said. ‘To get out they have to demonstrate that they are no longer a risk, which can be a very high standard. So, yeah, they do have grounds to be very upset.’

“In the executive bar, Bob Hare continued. He told me of an alarming world of globe-trotting experts, forensic psychologists, criminal profilers, traveling the planet armed with nothing much more than a Certificate of Attendance, just like the one I had. These people might have influence inside parole hearings, death penalty hearings, serial-killer incident rooms, and on and on. I think he saw his checklist as something pure – innocent as only science can be – but the humans who administered it as masses of weird prejudices and crazy dispositions.”

If Hare’s conscience is really bothering him, he could do more than try to distance himself from miscarriages of justice in interviews with journalists after the fact. He could stop training the legions of government SVP evaluators in the United States, and/or issue a policy statement about the misuse of his instrument in court.

Of course, that would mean a significant loss of revenue. Even Ronson, the author of The Psychopath Test, had to pay 400 pounds (media discount) to gain access to Hare at a 3-day PCL-R training course. And that didn’t include the cost of the 30-page manual, another 361 pounds.



My review of The Psychopath Test at Amazon:

The power to label is intoxicating. That’s what Jon Ronson found after taking a 3-day training that gave him license to diagnose people as psychopaths. Armed with a 40-item checklist, the journalist went gallivanting around the globe, sniffing out prospective psychopaths from convicted murderers to corporate job-slashers and Haitian war criminals. Ronson’s chronicle of his two-year quest for the elusive psychopath is at times whimsical, sometimes laugh-out-loud funny, and always riveting.
The review continues HERE. (As always, if you enjoy it, please click “yes.”)

May 25, 2011

NPR series on psychopathy in court

This Thursday and Friday, NPR is airing a 2-part series about the controversial use of psychopathy in court.

In Part I on All Things Considered, psychology and science reporter Alix Spiegel shows the profound negative consequences of psychopathy on the lives of those so designated. She profiles a California prisoner whose friends and family do not believe he is a psychopath, but who will likely never win parole due to that label.

Part II (on Friday) examines the history of the Psychopathy Checklist (PCL-R) and explores how it became so entrenched in the criminal justice system. Dan Murrie’s research questioning the reliability of the PCL in adversarial contexts is highlighted. Robert Hare also expresses concerns about the way his tool is being used in court, and the corrupting influences of the adversarial system.

The show will be complemented by an online debate, “Does the PCL-R test have value in the criminal justice system?” I am one of the featured experts on the panel, so I hope some of you will visit the website; I will provide a link just as soon as it goes live.

For those of you who would rather tune in once, the show is also airing as a single piece on Friday on the radio program This American Life, and you will be able to download the podcast there as well.

Part I of the 2-part series is HERE; the online debate is HERE.

March 25, 2011

Psychopathy: A Rorschach test for psychologists?

  • Compassion
  • Empathy
  • Impulsivity
  • Excitement-seeking
What do these personality traits have in common?

If you are high on any or all of them, you may be less likely to rate other people as psychopathic on the Psychopathy Checklist (PCL-R).

The PCL-R is the most widely used measure of psychopathy in the world. But in real-world forensic settings, scores vary widely depending upon which side retained the evaluator. This finding is called the "partisan allegiance" effect.

In a new twist, these same researchers that brought you partisan allegiance have found that an evaluator's personality may impact her judgments of psychopathy. Evaluators low on compassion and thrill-seeking as measured by a widely used personality test, the NEO Personality Inventory-Revised, are more likely than others to rate criminals as psychopathic.

That’s ironic, because according to the theory of psychopathy, it's supposed to be the psychopath -- not the psychologist -- who has a deficit in empathy.

The exploratory study, forthcoming in the journal Assessment, was based on a small sample of 22 individuals who were given nine hours of training by a clinical psychologist with substantial research and forensic practice experience with the PCL-R. "The daylong session was an attempt to replicate typical PCL-R training procedures," the study authors explain.

The researchers emphasize that their findings are preliminary and need to be replicated and extended. But if they hold up, they have intriguing implications not only for the psychopathy measure but also for other psychological tests with elements of subjectivity in scoring or interpretation.

The study did not examine the accuracy of the low versus high scorers. But if low-scoring evaluators are more empathetic, this implies that they may be more accurate in interpersonal assessment contexts.  

Subterranean class conflict?

Future research might examine class background, race and philosophical  beliefs to see if these influence scoring of the Psychopathy Checklist. In my informal observations, professionals who look for psychopaths under every rock tend to lack understanding of, or empathy for, those on the bottom.

Here's how that looks in practice:

The upper middle-class professional walks into the evaluation room, oblivious to the blinders and unconscious biases she brings to the table. Her subject, in contrast, is far from oblivious. With his more acute empathetic skills, the lower-class or minority individual accurately reads the professional's bias against him, which she transmits through nonverbal and other deniable cues. He also realizes that she holds all the power, and that her judgments will affect his future in very tangible ways.

He reacts with defensiveness, suspicion, or muted hostility -- especially if she is working for "the other side." But not recognizing his reaction as part of an interactional dance that she herself set in motion, the evaluator interprets his stance as evidence of intrinsic personality defect. She may see him as glib, superficially charming, conning, or manipulative -- all facets of Factor 1 (the personality dimension) on the Psychopathy Checklist.

In this interaction, all of the power belongs to the person who gets to do the labeling. Scoring and labeling the offender becomes a circular process through which the evaluator -- especially when primed by adversarial allegiance -- can  project her own class- or race-based prejudices, distancing herself from the evil other, while at the same time denying complicity. An obfuscating tautology is proffered as a simple explanation for complex and multi-determined antisocial acts.

There is more to the construct of psychopathy, of course. I focus here on its potential subjectivity because this is a facet that proponents rarely acknowledge, especially in public. Forensic experts owe a duty to explain the subjectivity of the PCL-R when it is introduced in court, where the label "psychopath" can be the kiss of death. When labeled as psychopaths:
  • Juveniles are more likely to face harsh punishment in adult court.
  • Sex offenders are more likely to be preventively detained.
  • Capital murder defendants are more likely to receive the death penalty.
So, the next time a promising young student proposes to study psychopathy or "the criminal mind," you might give her a gentle nudge in a more fruitful direction: Rather than treading this tired old path, she might contribute more to the field by studying the psyches of professionals who assign such diagnostic labels in the first place. 

January 21, 2011

How competent are the competency evaluators?

Largest real-world study finds modest agreement among independent alienists

A bad forensic report by a bad evaluator sets bad events in motion.

True story: A judge ordered a robbery suspect to undergo evaluation. A drive-by psychologist opined that the defendant was incompetent to stand trial due to schizophrenia and attention deficit/hyperactivity disorder (ADHD). The judge rubber-stamped the evaluator's opinion. The defendant was shipped off to the dysfunctional state hospital system for competency restoration treatment. There, the psychologist's diagnoses were rubber-stamped. The unruly defendant was shot full of powerful antipsychotics, given a few months of bus therapy, and proclaimed competent. The defendant had never been psychotic in the first place. Years later, he remained just as mentally retarded as ever.

"Penny-wise, pound-foolish" is the expression that comes to mind. The courts try to save money by appointing only one psychologist per case, and by paying a ludicrously small sum that encourages shoddy practices. But cleaning up the resultant messes is costly, inefficient, and fundamentally unfair.

Competency evaluations are the bread and butter of forensic work. An estimated 60,000 defendants per year -- roughly 5% of the total -- are evaluated to see whether they understand their legal situations and can rationally assist their lawyers in their defense. But for all of the importance of accurate assessments, both to a smoothly running court system and to the rights of the mentally ill to a fair trial, surprisingly little is known about the real-world accuracy of forensic evaluators.

In the case I just outlined, the judge viewed psychologists and psychiatrists as equal and interchangeable, all inherently reliable and trustworthy. At the other extreme, some believe forensic opinions are as random as a chimp with a typewriter.

Hawaii: Exemplar or exception?

Only one U.S. state squarely addresses the problem of reliability in competency evaluations. In the Aloha State, when a doubt is raised as to a defendant's competency, three separate evaluators must conduct independent evaluations. One evaluator is a state employee; the other two are independent. One must be a psychiatrist. By law, the three cannot talk with each other about the case.

This makes Hawaii the perfect setting to examine the real-world reliability of competency evaluators. In a study just accepted for publication in Law and Human Behavior, three investigators took advantage of this opportunity to conduct the largest naturalistic study ever of evaluators' agreement about competency to stand trial.

It should not be a surprise that Daniel Murrie and Marcus Boccaccini are two of the investigators. Not the types to run Psych 101 undergrads through artificial lab experiments, these two are committed to examining forensic practice in the courtroom trenches. I've blogged about their previous work exposing "partisan allegiance" effects in the real-world application of the Psychopathy Checklist (PCL-R). For the current innovative study, they teamed up with W. Neil Gowensmith of the Hawaii courts' forensic services unit.

Examining 729 reports authored by 35 evaluators, they found that all three evaluators agreed in just under three out of four -- or 71 percent -- of initial competency referrals. Agreement was a bit lower -- 61 percent -- in cases where defendants were being reevaluated after undergoing competency restoration treatment.

Consistent with the results of a hot-off-the-press meta-analysis of 50 years of competency research, evaluators believed that the broad majority of defendants referred for evaluation, about 73 percent, were competent to stand trial. This figure was somewhat lower for defendants being reevaluated after an initial finding of competency, with evaluators opining competence in about half of such restoration cases.

Why do evaluators differ?

As far as why agreement is not higher, the study raised more questions than it answered. The researchers sifted through the data looking for patterns, but none jumped out. Evaluators did not lean one way or the other by discipline (psychologist vs. psychiatrist) or by employer (state versus private practice). Defendant demographics were not explanatory. Nor were evaluator disagreements about diagnosis.

It would be interesting to conduct qualitative analyses of the 216 cases in this study to see whether those in which evaluators differed were more complex and ambiguous than the others. I suspect that to be the case.

Competency is nebulous. It exists along a continuum, so there is no precise cut point at which a defendant is automatically "competent" or "incompetent" to go forward with his legal case. Thus, evaluator agreement will never be perfect, nor -- necessarily -- should it be.

How did the judges rule?

One of the more intriguing aspects of the study was its exposition of how judges ruled after being presented with three reports. Not surprisingly, when evaluators were unanimous or split 2-1, the judges tended to go with the majority. But unlike the judge in the vignette I described earlier, many Hawaiian judges were independent thinkers who did not just rubber-stamp the evaluators' opinions.

When they disagreed with the opinions of the court-appointed psychologists and psychiatrists, it was typically to find a defendant incompetent. In fact, in a few cases the judges found defendants to be incompetent even when all three evaluators believed a defendant was competent. In this way, they elevated defendants' due-process rights over prosecutorial efficiency. But maybe that's just Hawaii.

Moving forward

I found the results somewhat encouraging. When not subjected to partisan allegiance pressures, forensic practitioners agreed about three-fourths of the time about whether a defendant was competent to stand trial or not.

Still, if these results are generalizable, it means evaluators will disagree in about two or three cases out of every ten. So in jurisdictions that appoint only a single evaluator, the researchers point out, many judges may be unwittingly rubber-stamping an idiosyncratic -- and even patently incorrect -- opinion:
[T]o the extent that there is a factually correct answer to the question of whether or not a defendant is competent to stand trial, relying on one evaluator increases the likelihood that the court reaches an incorrect decision (by following an incorrect single opinion that would have been revealed as a minority opinion if other evaluations were available). In some instances, this may result in delaying a trial while a defendant is unnecessarily hospitalized. In other instances this may result in a defendant proceeding to trial when additional evaluator(s) would have opined the defendant was unable to participate meaningfully in that trial….

The justice system needs to continue to wrestle with how to handle these competing demands -- efficient use of resources versus fair consideration of defendants' right to due process.
Murrie and Boccaccini are on a roll. Let's hope they keep churning out this ground-breaking line of research, examining the real-world vagaries of forensic practice, and that others climb down from the ivory towers and jump on their bandwagon.

As they note, "naturalistic studies of field reliability are an essential first step in gauging wide-scale quality across all manner of forensic practice and targeting areas for improvement."

September 3, 2010

Metaanalysis debunks psychopathy-violence link

No clear winner among violence risk tools

If you are looking for the best tool to assess someone's risk for violence, the array may seem confusing. Lots of acronyms, lots of statistical data about AUC's (Areas Under the Curve) and the like. What do do?

No worries. As it turns out, they're pretty much interchangeable. That is the bottom-line finding of a groundbreaking metaanalytic study in the APA journal Psychological Bulletin by three academic researchers from the United Kingdom.

The University of Nottingham researchers used sophisticated statistical tools to meta-analyze multiple studies on the accuracy of nine leading violence risk assessment tools. All nine turned out to have similarly moderate predictive accuracy, with none clearly leading the pack. And none -- the scholars warned -- were sufficiently accurate for courts to rely upon them as a primary basis for decision-making in forensic cases requiring "a high level of predictive accuracy, such as preventive detention."

Widely touted PCL-R's "Factor 1" a bust

In a result with potentially momentous implications for forensic practitioners, the researchers found that Factor 1 of the Psychopathy Checklist-Revised (PCL-R) does not predict violence. As you know, Factor 1 purports to measure the core constellation of a psychopathic personality (superficial charm, manipulativeness, lack of empathy, etc.). When introduced in court, evidence of psychopathy has an enormously prejudicial impact on criminal offenders.

But, the PCL-R's much-ballyhooed ability to predict certain types of violence owes only to the instrument's second factor, according to the metaanalysis by researchers Min Yang, Steve Wong, and Jeremy Coid. And that's no surprise. After all, Factor 2 measures the criminogenic factors (criminality, irresponsibility, impulsivity, history of delinquency, etc.) that even a fifth-grader knows are bad signs for a future of law-abiding citizenship.

In my experience, the Factor 1 items -- the ones purporting to measure an underlying personality profile -- are the ones more likely to be inflated by some evaluators. That's because many of these items are pretty subjective. Glib? Superficially charming? If you don't like a guy -- and/or he doesn't like you -- you are more likely to rate these negative items as present. That's one of my hypotheses for the large evaluator differences and partisan allegiance effects found with the PCL-R in forensic practice.

Cumulatively, the emerging PCL-R findings beg the question:

Why introduce the Psychopathy Checklist in court if other violence risk tools work just as well, without the implicitly prejudicial effect of labeling someone as a "psychopath"?

Psychopathy evidence skyrocketing in juvenile cases

Despite (or perhaps because of, in some cases) its prejudicial impact, the construct of psychopathy is increasingly being introduced in court cases involving juveniles. It is often used to infer that a youth should get a longer sentence because he or she is dangerous and not amenable to treatment.

Skyrocketing use of psychopathy evidence in juvenile cases
Source: Viljoen et al, Psychology, Public Policy, and Law (2010)


The first systematic review, published in the current issue of Psychology, Public Policy, and Law, found the use of psychopathy evidence against juveniles skyrocketing in both Canada and the United States. Psychopathy evidence is typically introduced when juveniles are being sentenced as adults and in sex offender commitment cases. It is also introduced in a variety of other cases, including ones involving disputed confessions, competency to stand trial, and criminal responsibility, report authors Jodi Viljoen, Emily MacDougall, Nathalie Gagnon, and Kevin Douglas.

In one egregious case showing how judges may improperly use evidence of psychopathy, a Canadian judge reasoned that a youth's "psychopathic device [sic] score" showed that under his "shy and unassuming" exterior lurked "a monster" that "at any time ... may well come alive." As a result, the judge sentenced this minor to an adult penitentiary.

Such inferences of unremitting danger and untreatability are improper. A large proportion of youths measured high in psychopathy score lower on psychopathy instruments once they mature. And so-called psychopathic youths are far from untreatable; in one recent study by Michael Caldwell and colleagues, after intensive treatment youths who scored high in psychopathy were actually less likely to recidivate than a comparison group in a juvenile jail.

"[T]he introduction of psychopathy evidence into juvenile forensic contexts has been somewhat rushed and premature at times," the authors conclude.

Have risk prediction tools hit the ceiling?

Researchers have been toiling for almost five decades to perfect risk prediction tools. Unfortunately, they keep running into an insurmountable obstacle: A large proportion of violence is situational. It's affected by environmental context, not just qualities internal to the individual. And not only that, but it is always extremely hard to predict a rare event.

Based on their metaanalytic findings, the UK researchers say maybe it's time to stop searching for the holy grail. Maybe we've reached the ceiling of predictive efficacy.
Violent behavior is the result of the individual interacting with the immediate environment. Although it may be possible to improve on our understanding and predicting what an individual may do in hypothetical situations, it will be much more difficult to predict the situation that an individual actually encounters in the open community. Even predicting violence within an institutional environment is difficult, where the assessor has much more information about that environment.
Instead, they say, it is time to turn our attentions to interventions that can reduce risk:
Building a better model of violence prediction should not be the sole aim of risk prediction research, which is just one link in the risk assessment-prediction-management triad that aims to achieve violence reduction and improved mental health…The risk, need and responsivity principles derived from the theory of the psychology of criminal conduct provide a useful theoretical framework for risk reduction intervention. Appropriate risk assessment can identify high-risk individuals in need of more intensive management and intervention…. Using tools with dynamic risk predictors to assess risk can identify appropriate changeable treatment targets linked to violence.
The studies included in the metaanalysis were from six countries: the United Kingdom (11), Canada (9), Sweden (3), the United States (3), Holland (2), and Germany (1). The instruments included the PCL-R, the PCL:SV, the HCR-20, the VRAG, the OGRS, the RM2000V, the LSI/LSI-R, the GSIR, and the VRS, as well as seven instrument ubscales: PCL-R Factor 1 and Factor 2, the 10-item Historical subscale, the five-item Clinical subscale, and the five-item Risk Management subscale of the HCR-20; and the Static and Dynamic scales of the VRS.

Dr. Wong, former Research Director at the Regional Psychiatric Centre in Saskatoon, Saskatchewan, studied psychopathy and high-risk offenders for 25 years and developed the Violent Risk Scale and the Violence Risk Scale-sexual offender version before becoming a special professor at the Institute of Mental Health at the University of Nottingham. Dr. Yang is a professor of medical statistics with the Faculty of Medicine and Health Sciences at the University of Nottingham. And Dr. Coid, Director of the Forensic Psychiatry Research Unit, is principal investigator of the UK Home Office’s Prisoner Cohort Study and also studies the epidemiology of violent and criminal behavior at the population level.

The articles reported on here are: Of related interest:

June 18, 2010

New study on juvenile sex offender treatment

Efficacy claimed, but control group questionable

I previously reviewed forensic psychologist Frank DiCataldo's excellent book on juvenile sex offenders, in which he claims there is very little good research to show that sex offender-specific treatment is useful with adolescents. As DiCataldo points out, most of the studies (and the meta-analyses of studies) purporting to show a treatment effect have serious flaws. One big problem is the lack of control groups. This makes it impossible to know whether juveniles who did well after sex-offender treatment would have done equally well without treatment or with generic treatment not focused on sex offending.

But now, a study out of Canada (where else?!) is being trumpeted as methodologically sound proof that -- drum roll here -- treatment works. The study followed 148 adolescents for up to 20 years, which is a very long time for this type of research. Overall, only 17 of the subjects (about 11.5%) picked up a new sex offense as adults, with another 7 getting a new charge only in adolescence, for a total of 24 recidivists (16%). That's in line with a growing body of data on the very low recidivism rates of juvenile sex offenders, ranging from about 4% to 15%.

When they broke it down by those who underwent their specialized treatment, compared with a comparison group that did not, they found that only 9% (5 out of 58) of the youths who had gone through their program got charged with a new sex offense over the next 20 years, compared with 21% (19 out of 90) of those who had not. That's a significant difference.

But here's the rub. The participants were not randomly assigned to treatment (versus no treatment), which is how it's supposed to be done in psychotherapy treatment outcome research. Otherwise, you never know if there is something about the selection process that affected the results. Not only was assignment to groups not random, but the researchers put the kids who refused to undergo treatment, along with those who dropped out of treatment early, into their so-called "control" group! In fact, these bad boys comprised fully half of the non-treatment group. My guess is that these dropouts and refusers were probably a whole lot more delinquent than the other kids in the first place.

So it's possible that what the research really shows is not that treatment works, but that hardcore delinquents who refuse or drop out of treatment are likely to get into more trouble in their later teens and early 20s. It would be interesting to see if the control group still showed a higher recidivism rate if they removed the dropouts and refusers from the analyses. In fact, I would love to see some qualitative analyses of who those 19 recidivists (out of the total of 90 in the control group) are.

Another potential confounder I noticed was that many of the adolescents in the non-treatment control group were apparently in some different kind of treatment at the time. That treatment is not described, so perhaps this study is more of a between-treatments design, rather than a study of treatment versus no treatment.

Again, as I've discussed previously, part of the "problem" both with accurately predicting which juveniles will reoffend and also with designing treatment programs that work is the very low overall rate of recidivism among juveniles who have sexually offended. As DiCataldo and others have pointed out, if you just predict that no juvenile caught for a sex crime will reoffend, you will be correct in the broad majority of cases.

Flattening trajectory

The study, by psychologist James Worling and his colleagues at the Sexual Abuse: Family Education & Treatment Program (SAFE-T) program in Ontario, did have some other interesting findings. The researchers found that most recidivism -- both sexual and nonsexual -- happens within the first few years. Offending flattens out significantly at about the 10-year mark, when folks hit about the age of 25. This is consistent with the recent study by Lussier and colleagues of sex offender trajectories, as well as the general criminology literature on desistance. Crime, including sex offending, is a young man's game.

The finding that only 11.5% of the participants were charged with subsequent sexual offenses as adults is also in line with other research showing very low recidivism for juveniles. A strength of this study is its long course; it followed kids all the way up to an average age of 31.

The study is: Worling, J.R., Littlejohn, A., & Bookalam, D. (2010). 20-year prospective follow-up study of specialized treatment for adolescents who offended sexually. Behavioral Sciences and the Law, 28, 46-57.

Special journal issue on adolescent sex offenders

NOTE: It looks like this entire special issue of Behavioral Sciences and the Law is available online. Other interesting articles in the issue include:

Inter-rater reliability of the PCL-R total and factor scores among psychopathic sex offenders: are personality features more prone to disagreement than behavioral features?
John F. Edens, Marcus T. Boccaccini, Darryl W. Johnson

Searching for the developmental origins of sexual violence: examining the co-occurrence of physical aggression and sexual behaviors in early childhood
Patrick Lussier, Jay Healey

Assessing risk of sexually abusive behavior among youth in a child welfare sample
Robert A. Prentky, Nien-Chen Li, Sue Righthand, Ann Schuler, Deborah Cavanaugh, Austin F. Lee

Psychological mechanisms underlying support for juvenile sex offender registry laws: prototypes, moral outrage, and perceived threat
Jessica M. Salerno, Cynthia J. Najdowski, Margaret C. Stevenson, Tisha R. A. Wiley, Bette L. Bottoms, Roberto Vaca Jr., Pamela S. Pimentel

Legal, ethical, and methodological considerations in the Internet-based study of child pornography offenders (p 84-105)
James V. Ray, Eva R. Kimonis, Christine Donoghue

June 17, 2010

Psychopathy brouhaha: It's a wrap (I hope!)

Today's Scientific American has more on the censorship controversy I've featured here in recent weeks. As regular readers know, the flap centers around allegations that psychopathy researcher Robert Hare tried to silence critics by threatening to sue. The controversial article was finally published this month in the American Psychological Association publication Psychological Assessment, but the fallout continues.

The column by J.R. Minkel, oddly titled "Fear Review," features a rundown, including commentary by prominent scholar Stephen Hart:
People familiar with the matter say the scale's author, Robert Hare of the University of British Columbia, deserves only partial blame for the delay, to be shared with the American Psychological Association (APA), the journal's publisher. But they say Hare's use of legal threats has at best subverted the peer review process that is the crux of modern scientific progress, and could at worst encourage junior researchers in the field of forensic psychology to pursue other lines of research.

"I find this action to be completely inconsistent with the man I had [great] respect and affection for," says Stephen Hart of Simon Fraser University in British Columbia, a collaborator and former student of Hare's. "People I speak with automatically think, 'Well, what's in that article that makes him so upset? What's he so afraid of?'
After reading all of the publicly available materials on the controversy, as well as numerous email posts on professional listservs, here's how I boil things down to the essence:
  • The Skeem and Cooke article is an important scientific analysis of the theoretical construct of psychopathy, which is increasingly being used as a weapon in court with grave consequences for those it is deployed against.
  • Not surprisingly, Robert Hare disagrees with Skeem and Cooke. Specifically, he does not agree with their claim that his Psychopathy Checklist or the underlying psychopathy construct centralizes criminality.
  • Hare claims that Skeem and Cooke distorted his work. In a written response, he gives three examples of alleged distortions. Presumably, since he was preparing his response for publication, he picked the best examples he could find to illustrate his complaint. Yet, these are nowhere near as egregious as I had imagined they would be, given his threat to sue.
  • Hare accuses two well respected psychology-law leaders, Norm Poythress and John Petrila, of being biased and misinformed. But nothing in his response supports this. Poythress and Petrila, in their article in the International Journal of Forensic Mental Health that set this whole ball in motion, were careful not to take sides in the underlying scientific debate over psychopathy. Rather, they focused on the threat to academic freedom and science posed by threats to sue: "Academic freedom rests on the premise that advances in science can only occur if scholars are permitted to pursue free competition among ideas. This assumes that scholars have the liberty to do their work free from limitations imposed by political or religious pressure or by economic reprisals."
  • Hare has claimed elsewhere that his "lawsuit threat was meant only to get the 'attention' of APA, Skeem, and Cooke and force changes to the article." In his essay, he expresses bafflement at the ensuing, lengthy delay in the article's publication. To claim that his threat to sue did not contribute to the lengthy delay is either disingenuous or naïve. Especially in the wake of other controversies, such as the Rind debacle in which the U.S. Congress blasted the APA's publication and peer review process, the Association is undoubtedly very gun-shy and reactive over lawsuit threats.
The bottom line:

After analyzing all sides of the issue, I find that the Skeem and Cooke article is an important and timely contribution to the field, and that threats to sue over such publications set a dangerous precedent. As Poythress and Petrila point out in their commentary, potential negative effects of defamation threats against scientific researchers include -- among other things -- that:
  1. researchers avoid critical research out of fear of lawsuits,
  2. academics avoid volunteering as peer reviewers, and
  3. journal editors self-censor on controversial topics
Censorship -- or even the appearance of censorship -- is especially dangerous when it involves critique of a construct that may be used in a partisan manner in the forensic arena.

Hare is entitled to express his opinion, but nothing in his public response changes these bottom lines. Rather, as Jennifer Skeem notes in today's Scientific American piece, all of this peripheral controversy distracts from the scientific critique of psychopathy, including her critique that was silenced for three years before finally seeing the light of day.

I sure hope this is my last blog post for a while on this topic!

PRIVATE NOTE TO TODAY'S "ANONYMOUS" BLOG COMMENTER:
I regret that I had to reject your comment about the pecuniary angle from publication.
While I found it quite interesting, I had no easy way to substantiate its accuracy.

GENERAL NOTE TO COMMENTERS:
I encourage comments, but it's nice to know who is talking;
please consider signing your name (or at least a pseudonym).