Showing posts sorted by relevance for query PCL-R. Sort by date Show all posts
Showing posts sorted by relevance for query PCL-R. Sort by date Show all posts

September 4, 2014

More studies finding bias in PCL-R measurement of psychopathy

I've been reporting for quite some time about problems with the reliability and validity of the Psychopathy Checklist (PCL-R), a popular instrument for measuring psychopathy in forensic settings. It is a critical issue in forensic psychology, because of the massively prejudicial nature of the term "psychopath." Once a judge or jury hears that term, pretty much everything else sounds like "blah blah blah."

Now, the journal Law and Human Behavior has published two new studies -- one from the U.S. and the other from Sweden -- adding to the ever-more-persuasive line of research on PCL-R rater bias. It's high time for a critical examination of whether the PCL-R belongs in court, but I doubt that will happen anytime soon because of its efficacy for obtaining desired results. At the bottom of each abstract, I've provided contact information so that you can request the full articles from the authors.

* * * * * 

Field Reliability of the Psychopathy Checklist-Revised Among Life Sentenced Prisoners in Sweden

Joakim Sturup, John F. Edens, Karolina Sörman, Daniel Karlberg, Björn Fredriksson and Marianne Kristiansson Law and Human Behavior 2014, Vol. 38, No. 4, 315-324

ABSTRACT: Although typically described as reliable and valid, the Psychopathy Checklist-Revised (PCL-R) has come under some criticism by researchers in the last half-decade due to evidence of poor interrater reliability and adversarial allegiance being reported in applied settings in North America. This study examines the field reliability of the PCL-R using a naturalistic test–retest design among a sample of Swedish life sentenced prisoners (N 27) who had repeatedly been assessed as part of their application to receive a reduced prison term. The prisoners, who were assessed by a team of forensic evaluators retained by an independent government authority, had spent on average 14 years in prison with a mean time from Assessment 1 to Assessment 2 of 2.33 years. The overall reliability of the PCL-R (ICCA1) was .70 for the total score and .62 and .76 for Factor 1 and 2 scores, respectively. Facet 1–3 scores ranged from .54 to .60, whereas Facet 4 was much higher (.90). Reliability of individual items was quite variable, ranging from .23 to .80. In terms of potential causes of unreliability, both high and low PCL-R scores at the initial assessment tended to regress toward the mean at the time of the second evaluation. Our results are in line with previous research demonstrating concerns regarding the reliability of the PCL-R within judicial settings, even among independent evaluation teams not retained by a particular side in a case. Collectively, these findings question whether the interpersonal (Facet 1) and affective (Facet 2) features tapped by the PCL-R are reliable enough to justify their use in legal proceedings.

Request a copy from the author. 
* * * * * 

Evaluator Differences in Psychopathy Checklist-Revised Factor and Facet Scores 

Marcus T. Boccaccini, Daniel C. Murrie, Katrina A. Rufino and Brett O. Gardner Law and Human Behavior 2014, Vol. 38, No. 4, 337-345

ABSTRACT: Recent research suggests that the reliability of some measures used in forensic assessments—such as Hare’s (2003) Psychopathy Checklist-Revised (PCL-R)—tends to be weaker when applied in the field, as compared with formal research studies. Specifically, some of the score variability in the field is attributable to evaluators themselves, rather than the offenders they evaluate. We studied evaluator differences in PCL-R scoring among 558 offenders (14 evaluators) and found evidence of large evaluator differences in scoring for each PCL-R factor and facet, even after controlling for offenders’ self-reported antisocial traits. There was less evidence of evaluator differences when we limited analyses to the 11 evaluators who reported having completed a PCL-R training workshop. Findings provide indirect but positive support for the benefits of PCL-R training, but also suggest that evaluator differences may be evident to some extent in many field settings, even among trained evaluators.

Request from author.

More of my coverage of the PCL-R is available HERE. An NPR series on the controversy -- including an essay by me -- is HERE.

Hat tip: Brian Abbott

October 30, 2011

Study: Psychopathy score fails to predict sexual recidivism

Many forensic psychologists believe that psychopathy is a risk factor for sex offender recidivism. Not surprisingly, when forensic psychologists assign a sex offender a high score on a psychopathy test, it increases the risk of extreme legal sanctions such as civil commitment.

But a new study out of Texas found zero correlation between sexual recidivism and psychopathy, as measured by the widely used Psychopathy Checklist (PCL-R). If anything, sex offenders who were arrested for a new sexually violent offense tended to have lower scores on the PCL-R than those who were not rearrested!

Regular blog readers should be familiar with these researchers by now: Dan Murrie, Marcus Boccaccini and crew are the same scholars who informed us of the partisan allegiance phenomenon, in which evaluators working for the government tend to assign far higher PCL-R scores than do those working for the defense.

In their most recent study, they examined PCL-R scores from about 400 sex offenders in Texas who were released from prison and then tracked for anywhere from about two to seven years. They examined not just the total scores on the PCL-R, but also scores on the instrument's two factors, as well as four so-called facet scores. Not one of these seven PCL-R variables was a statistically significant predictor of whether a man would be arrested for a new sex crime.

“Overall, these predictive validity findings were striking because the PCL-R apparently failed to predict the type of outcome (i.e., sexually violent reoffense) for which it was administered in this context,” the authors noted.

Further, in cases in which the PCL-R was administered by more than one evaluator, the study found poor agreement between the two, even though both were working for the government. Consistent with prior research, interrater agreement was higher on Factor 2, which measures antisocial behavior and an impulsive lifestyle, than on Factor 1, which measures the vaguely operationalized personality and emotional dynamics thought to underlie psychopathy.

In an interesting twist, the researchers tried to determine whether some evaluators were more accurate than others at predicting recidivism through PCL-R scores. They identified four highly prolific evaluators; together, these three psychologists and one medical doctor had provided almost two-thirds of the PCL-R scores in the study. Although the PCL-R scores of three of these four evaluators were more likely than other evaluators' scores to correlate with a new arrest for a non-sexual crime, even these evaluators could not produce PCL-R scores that predicted sexual offense recidivism.

Despite the PCL-R’s lack of predictive validity, sex offenders with higher PCL-R scores were more likely than others to be recommended for civil commitment, indicating that the unreliable rating was far from harmless in forensic practice.

The study is: 

Murrie, D. C., Boccaccini, M. T., Caperton, J. and Rufino, K. Field Validity of the Psychopathy Checklist–Revised in Sex Offender Risk Assessment. Psychological Assessment. Click HERE to request a copy from the first author, at the Institute of Law, Psychiatry, and Public Policy.

Of related interest:

January 5, 2014

New evidence of psychopathy test's poor accuracy in court

Use of a controversial psychopathy test is skyrocketing in court, even as mounting evidence suggests that the prejudicial instrument is highly inaccurate in adversarial settings.

The latest study, published by six respected researchers in the influential journal Law and Human Behavior, explored the accuracy of the Psychopathy Checklist, or PCL-R, in Sexually Violent Predator cases around the United States.

The findings of poor reliability echo those of other recent studies in the United States, Canada and Europe, potentially heralding more admissibility challenges in court. 

Although the PCL-R is used in capital cases, parole hearings and juvenile sentencing, by far its most widespread forensic use in the United States is in Sexually Violent Predator (SVP) cases, where it is primarily invoked by prosecution experts to argue that a person is at high risk for re-offense. Building on previous research, David DeMatteo of Drexel University and colleagues surveyed U.S. case law from 2005-2011 and located 214 cases from 19 states -- with California, Texas and Minnesota accounting for more than half of the total -- that documented use of the PCL-R in such proceedings.

To determine the reliability of the instrument, the researchers examined a subset of 29 cases in which the scores of multiple evaluators were reported. On average, scores reported by prosecution experts were about five points higher than those reported by defense-retained experts. This is a large and statistically significant difference that cannot be explained by chance. 

Prosecution experts were far more likely to give scores of 30 or above, the cutoff for presumed psychopathy. Prosecution experts reported scores of 30 or above in almost half of the cases, whereas defense witnesses reported scores that high in less than 10 percent.

Looking at interrater reliability another way, the researchers applied a classification scheme from the PCL-R manual in which scores are divided into five discreet categories, from “very low” (0-8) to “very high” (33-40). In almost half of the cases, the scores given by two evaluators fell into different categories; in about one out of five cases the scores were an astonishing two or more categories apart (e.g., “very high” versus “moderate” psychopathy). 

Surprisingly, interrater agreement was even worse among evaluators retained by the same side than among opposing experts, suggesting that the instrument’s inaccuracy is not solely due to what has been dubbed adversarial (or partisan) allegiance.

Despite its poor accuracy, the PCL-R is extremely influential in legal decision-making. The concept of psychopathy is superficially compelling in our current era of mass incarceration, and the instrument's popularity shows no sign of waning. 

Earlier this year, forensic psychologist Laura Guy and colleagues reported on its power in parole decision-making in California. The state now requires government evaluators to use the PCL-R in parole fitness evaluations for “lifers,” or prisoners sentenced to indeterminate terms of up to life in prison. Surveying several thousand cases, the researchers found that PCL-R scores were a strong predictor of release decisions by the Parole Board, with those granted parole scoring an average of about five points lower than those denied for parole. Having just conducted one such evaluation, I was struck by the frightening fact – alluded to by DeMatteo and colleagues -- that the chance assignment of an evaluator who typically gives high scores on the PCL-R “might quite literally mean the difference between an offender remaining in prison versus being released back into the community.”

Previous research has established that Factor 1 of the two-factor instrument – the factor measuring characterological traits such as manipulativeness, glibness and superficial charm – is especially prone to error in forensic settings. This is not surprising, as traits such as “glibness” are somewhat in the eye of the beholder and not objectively measurable. Yet, the authors assert, “it is exactly these traits that seem to have the most impact” on judges and juries.

Apart from the issue of poor reliability, the authors questioned the widespread use of the PCL-R as evidence of impaired volitional control, an element required for civil commitment in SVP cases. They labeled as “ironic, if not downright contradictory” the fact that psychopathy is often touted in traditional criminal responsibility (or insanity) cases as evidence of badness as opposed to mental illness, yet in SVP cases it magically transforms into evidence of a major mental disorder that interferes with self-control. 

The evidence is in: The Psychopathy Checklist-Revised is too inaccurate in applied settings to be relied upon in legal decision-making. With consistent findings of abysmal interrater reliability, its prejudicial impact clearly outweighs any probative value. However, the gatekeepers are not guarding the gates. So long as judges and attorneys ignore this growing body of empirical research, prejudicial opinions will continue to be cloaked in a false veneer of science, contributing to unjust outcomes.

* * * * *
The study is: 

The Role and Reliability of the Psychopathy Checklist-Revised in U.S. Sexually Violent Predator Evaluations: A Case Law Survey by DeMatteo, D., Edens, J. F., Galloway, M., Cox, J., Toney Smith, S. and Formon, D. (2013). Law and Human Behavior

Copies may be requested from the first author (HERE).

The same research team has just published a parallel study in Psychology, Public Policy and Law

“Investigating the Role of the Psychopathy Checklist-Revised in United States Case Law” by DeMatteo, David; Edens, John F.; Galloway, Meghann; Cox, Jennifer; Smith, Shannon Toney; Koller, Julie Present; Bersoff, Benjamin

My related essays and blog posts (I especially recommend the three marked with asterisks):



(c) Copyright Karen Franklin 2013 - All rights reserved

September 3, 2010

Metaanalysis debunks psychopathy-violence link

No clear winner among violence risk tools

If you are looking for the best tool to assess someone's risk for violence, the array may seem confusing. Lots of acronyms, lots of statistical data about AUC's (Areas Under the Curve) and the like. What do do?

No worries. As it turns out, they're pretty much interchangeable. That is the bottom-line finding of a groundbreaking metaanalytic study in the APA journal Psychological Bulletin by three academic researchers from the United Kingdom.

The University of Nottingham researchers used sophisticated statistical tools to meta-analyze multiple studies on the accuracy of nine leading violence risk assessment tools. All nine turned out to have similarly moderate predictive accuracy, with none clearly leading the pack. And none -- the scholars warned -- were sufficiently accurate for courts to rely upon them as a primary basis for decision-making in forensic cases requiring "a high level of predictive accuracy, such as preventive detention."

Widely touted PCL-R's "Factor 1" a bust

In a result with potentially momentous implications for forensic practitioners, the researchers found that Factor 1 of the Psychopathy Checklist-Revised (PCL-R) does not predict violence. As you know, Factor 1 purports to measure the core constellation of a psychopathic personality (superficial charm, manipulativeness, lack of empathy, etc.). When introduced in court, evidence of psychopathy has an enormously prejudicial impact on criminal offenders.

But, the PCL-R's much-ballyhooed ability to predict certain types of violence owes only to the instrument's second factor, according to the metaanalysis by researchers Min Yang, Steve Wong, and Jeremy Coid. And that's no surprise. After all, Factor 2 measures the criminogenic factors (criminality, irresponsibility, impulsivity, history of delinquency, etc.) that even a fifth-grader knows are bad signs for a future of law-abiding citizenship.

In my experience, the Factor 1 items -- the ones purporting to measure an underlying personality profile -- are the ones more likely to be inflated by some evaluators. That's because many of these items are pretty subjective. Glib? Superficially charming? If you don't like a guy -- and/or he doesn't like you -- you are more likely to rate these negative items as present. That's one of my hypotheses for the large evaluator differences and partisan allegiance effects found with the PCL-R in forensic practice.

Cumulatively, the emerging PCL-R findings beg the question:

Why introduce the Psychopathy Checklist in court if other violence risk tools work just as well, without the implicitly prejudicial effect of labeling someone as a "psychopath"?

Psychopathy evidence skyrocketing in juvenile cases

Despite (or perhaps because of, in some cases) its prejudicial impact, the construct of psychopathy is increasingly being introduced in court cases involving juveniles. It is often used to infer that a youth should get a longer sentence because he or she is dangerous and not amenable to treatment.

Skyrocketing use of psychopathy evidence in juvenile cases
Source: Viljoen et al, Psychology, Public Policy, and Law (2010)


The first systematic review, published in the current issue of Psychology, Public Policy, and Law, found the use of psychopathy evidence against juveniles skyrocketing in both Canada and the United States. Psychopathy evidence is typically introduced when juveniles are being sentenced as adults and in sex offender commitment cases. It is also introduced in a variety of other cases, including ones involving disputed confessions, competency to stand trial, and criminal responsibility, report authors Jodi Viljoen, Emily MacDougall, Nathalie Gagnon, and Kevin Douglas.

In one egregious case showing how judges may improperly use evidence of psychopathy, a Canadian judge reasoned that a youth's "psychopathic device [sic] score" showed that under his "shy and unassuming" exterior lurked "a monster" that "at any time ... may well come alive." As a result, the judge sentenced this minor to an adult penitentiary.

Such inferences of unremitting danger and untreatability are improper. A large proportion of youths measured high in psychopathy score lower on psychopathy instruments once they mature. And so-called psychopathic youths are far from untreatable; in one recent study by Michael Caldwell and colleagues, after intensive treatment youths who scored high in psychopathy were actually less likely to recidivate than a comparison group in a juvenile jail.

"[T]he introduction of psychopathy evidence into juvenile forensic contexts has been somewhat rushed and premature at times," the authors conclude.

Have risk prediction tools hit the ceiling?

Researchers have been toiling for almost five decades to perfect risk prediction tools. Unfortunately, they keep running into an insurmountable obstacle: A large proportion of violence is situational. It's affected by environmental context, not just qualities internal to the individual. And not only that, but it is always extremely hard to predict a rare event.

Based on their metaanalytic findings, the UK researchers say maybe it's time to stop searching for the holy grail. Maybe we've reached the ceiling of predictive efficacy.
Violent behavior is the result of the individual interacting with the immediate environment. Although it may be possible to improve on our understanding and predicting what an individual may do in hypothetical situations, it will be much more difficult to predict the situation that an individual actually encounters in the open community. Even predicting violence within an institutional environment is difficult, where the assessor has much more information about that environment.
Instead, they say, it is time to turn our attentions to interventions that can reduce risk:
Building a better model of violence prediction should not be the sole aim of risk prediction research, which is just one link in the risk assessment-prediction-management triad that aims to achieve violence reduction and improved mental health…The risk, need and responsivity principles derived from the theory of the psychology of criminal conduct provide a useful theoretical framework for risk reduction intervention. Appropriate risk assessment can identify high-risk individuals in need of more intensive management and intervention…. Using tools with dynamic risk predictors to assess risk can identify appropriate changeable treatment targets linked to violence.
The studies included in the metaanalysis were from six countries: the United Kingdom (11), Canada (9), Sweden (3), the United States (3), Holland (2), and Germany (1). The instruments included the PCL-R, the PCL:SV, the HCR-20, the VRAG, the OGRS, the RM2000V, the LSI/LSI-R, the GSIR, and the VRS, as well as seven instrument ubscales: PCL-R Factor 1 and Factor 2, the 10-item Historical subscale, the five-item Clinical subscale, and the five-item Risk Management subscale of the HCR-20; and the Static and Dynamic scales of the VRS.

Dr. Wong, former Research Director at the Regional Psychiatric Centre in Saskatoon, Saskatchewan, studied psychopathy and high-risk offenders for 25 years and developed the Violent Risk Scale and the Violence Risk Scale-sexual offender version before becoming a special professor at the Institute of Mental Health at the University of Nottingham. Dr. Yang is a professor of medical statistics with the Faculty of Medicine and Health Sciences at the University of Nottingham. And Dr. Coid, Director of the Forensic Psychiatry Research Unit, is principal investigator of the UK Home Office’s Prisoner Cohort Study and also studies the epidemiology of violent and criminal behavior at the population level.

The articles reported on here are: Of related interest:

February 15, 2020

Flawed science? Two efforts launched to improve scientific validity of psychological test evidence in court

There’s this forensic psychologist, we’ll call him Dr. Harms, who is infamous for his unorthodox approach. He scampers around the country deploying a bizarre admixture of obscure, outdated and unpublished tests that no one else has ever heard of.

Oh, and the Psychopathy Checklist (PCL-R). Dr. Harms never omits that. To him, everyone is a chillingly dangerous psychopath. Even a 30-year-old whose last crime was at age 15.

What’s most bizarre about Dr. Harms’s esoteric method is that he gets away with it. Attorneys may try to challenge him in court, but their protests usually fall flat. Judges rule that any weaknesses in his method should go to the “weight” that jurors give Dr. Harm’s opinions, rather than the admissibility of his tests.

Psychological tests hold a magical allure as objective truth. They retain their luster even while forensic science techniques previously regarded as bulletproof are undergoing unprecedented scrutiny. Based in large part on our briefcases full of tests, courts have granted psychologists unprecedented influence over an ever-increasing array of thorny issues, from future dangerousness to parental fitness to refugee trauma. Behind the scenes, meanwhile, a lucrative test-production industry is gleefully rubbing its hands all the way to the bank.

In other forensic “science” niches such as bite-mark analysis and similar types of pattern matching that have contributed to wrongful convictions, appellate attorneys have had to wage grueling, decades-long efforts to reign in shoddy practice. (See Radley Balko's The Cadaver King and the Country Dentist for more on this.) But leaders in the field of forensic psychology are grabbing the bull by the horns and inviting us to do better, proposing novel ways for us to self-police.

New report slams "junk science” psychological assessments


In one of two significant developments, a group of researchers today released evidence of systematic problems with the state of psychological test admissibility in court. The researchers' comprehensive survey found that only about two-thirds of the tools used by clinicians in forensic settings were generally accepted in the field, while even fewer -- only about four in ten -- were favorably reviewed in authoritative sources such as the Mental Measurements Yearbook.

Despite this, psychological tests are rarely challenged when they are introduced in court, Tess M.S. Neal and her colleagues found. Even when they are, the challenges fail about two-thirds of the time. Worse yet, there is little relationship between a tool’s psychometric quality and the likelihood of it being challenged.

Slick ad for one of a myriad of new psych tests.
“Some of the weakest tools tend to get a pass from the courts,” write the authors of the newly issued report, "Psychological Assessments in Legal Contexts: Are Courts Keeping 'Junk Science' Out of the Courtroom?”

The report, currently in press in the journal Psychological Science in the Public Interest, proposes that standard batteries be developed for forensic use, based on the consensus of experts in the field as to which tests are the most reliable and valid for assessing a given psycholegal issue. It further cautions against forensic deployment of newly developed tests that are being marketed by for-profit corporations before adequate research or review by independent professionals.

"Life or death" call to halt prejudicial use of psychopathy test


In a parallel development in the field, 13 prominent forensic psychologists have issued a rare public rebuke of improper use of the controversial Psychopathy Checklist (PCL-R) in court. The group is calling for a halt to the use of the PCL-R in the sentencing phase of death-penalty cases as evidence that a convicted killer will be especially dangerous if sentenced to life in prison rather than death.

As I’ve reported previously in a series of posts (here and here, for example), scores on the PCL-R swing wildly in forensic settings based on which side hired the expert. In a phenomenon known as adversarial allegiance, prosecution-retained experts produce scores in the high-psychopathy range in about half of cases, as compared with less than one out of ten cases for defense experts.

Research does not support testimony being given by prosecution experts in capital trials that PCL-R scores can accurately predict serious violence in institutional settings such as prison, according to the newly formed Group of Concerned Forensic Mental Health Professionals. And once such a claim is made in court, its prejudicial impact on jurors is hard to overcome, potentially leading to a vote for execution.

The "Statement of Concerned Experts," whose authors include prominent professionals who helped to develop and test the PCL-R, is forthcoming from the respected journal Psychology, Public Policy, and Law.

Beware the all-powerful law of unintended consequences


This scrutiny of how psychological instruments are being used in forensic practice is much needed and long overdue. Perhaps eventually it may even trickle down to our friend Dr. Harms, although I have a feeling it won't be before his retirement.

But never underestimate the law of unintended consequences.

The research group that surveyed psychological test use in the courts developed a complex, seemingly objective method to sort tests according to whether they were generally accepted in the field and/or favorably reviewed by independent researchers and test reviewers.

Ironically enough, one of the tests that they categorized as meeting both criteria – general acceptance and favorable review – was the PCL-R, the same test being targeted by the other consortium for its improper deployment and prejudicial impact in court. (Perhaps not so coincidentally, that test is a favorite of the aforementioned Dr. Harms, who likes to score it high.)

The disconnect illustrates the fact that science doesn’t exist in a vacuum. Psychopathy is a value-laden construct that owes its popularity in large part to current cultural values, which favor the individual-pathology model of criminal conduct over notions of rehabilitation and desistance from crime.

It’s certainly understandable why reformers would suggest the development of “standard batteries … based on the best clinical tools available.” The problem comes in deciding what is “best.”

Who will be privileged to make those choices (which will inevitably reify the dominant orthodoxy and its implicit assumptions)?

What alternatives will those choices exclude? And at whose expense?

And will that truly result in fairer and more scientifically defensible practice in the courtroom?

It’s exciting that forensic psychology leaders are drawing attention to the dark underbelly of psychological test deployment in forensic practice. But despite our best efforts, I fear that equitable solutions may remain thorny and elusive.

March 5, 2013

Remarkable experiment proves pull of adversarial allegiance

 Psychologists' scoring of forensic tools depends on which side they believe has hired them

A brilliant experiment has proven that adversarial pressures skew forensic psychologists' scoring of supposedly objective risk assessment tests, and that this "adversarial allegiance" is not due to selection bias, or preexisting differences among evaluators.

The researchers duped about 100 experienced forensic psychologists into believing they were part of a large-scale forensic case consultation at the behest of either a public defender service or a specialized prosecution unit. After two days of formal training by recognized experts on two widely used forensic instruments -- the Psychopathy Checklist-R (PCL-R) and the Static-99R -- the psychologists were paid $400 to spend a third day reviewing cases and scoring subjects. The National Science Foundation picked up the $40,000 tab.

Unbeknownst to them, the psychologists were all looking at the same set of four cases. But they were "primed" to consider the case from either a defense or prosecution point of view by a research confederate, an actual attorney who pretended to work on a Sexually Violent Predator (SVP) unit. In his defense attorney guise, the confederate made mildly partisan but realistic statements such as "We try to help the court understand that ... not every sex offender really poses a high risk of reoffending." In his prosecutor role, he said, "We try to help the court understand that the offenders we bring to trial are a select group [who] are more likely than other sex offenders to reoffend." In both conditions, he hinted at future work opportunities if the consultation went well. 

The deception was so cunning that only four astute participants smelled a rat; their data were discarded.

As expected, the adversarial allegiance effect was stronger for the PCL-R, which is more subjectively scored. (Evaluators must decide, for example, whether a subject is "glib" or "superficially charming.") Scoring differences on the Static-99R only reached statistical significance in one out of the four cases.

The groundbreaking research, to be published in the journal Psychological Science, echoes previous findings by the same group regarding partisan bias in actual court cases. But by conducting a true experiment in which participants were randomly assigned to either a defense or prosecution condition, the researchers could rule out selection bias as a cause. In other words, the adversarial allegiance bias cannot be solely due to attorneys shopping around for simpatico experts, as the experimental participants were randomly assigned and had no group differences in their attitudes about civil commitment laws for sex offenders.

Sexually Violent Predator cases are an excellent arena for studying adversarial allegiance, because the typical case boils down to a "battle of the experts." Often, the only witnesses are psychologists, all of whom have reviewed essentially the same material but have differing interpretations about mental disorder and risk. In actual cases, the researchers note, the adversarial pressures are far higher than in this experiment:
"This evidence of allegiance was particularly striking because our experimental manipulation was less powerful than experts are likely to encounter in most real cases. For example, our participating experts spent only 15 minutes with the retaining attorney, whereas experts in the field may have extensive contact with retaining attorneys over weeks or months. Our experts formed opinions based on files only, which were identical across opposing experts. But experts in the field may elicit different information by seeking different collateral sources or interviewing offenders in different ways. Therefore, the pull toward allegiance in this study was relatively weak compared to the pull typical of most cases in the field. So the large group differences provide compelling evidence for adversarial allegiance."

This is just the latest in a series of stunning findings by this team of psychologists led by Daniel Murrie of the University of Virginia and Marcus Boccaccini of Sam Houston University on an allegiance bias among psychologists. The tendency of experts to skew data to fit the side who retains them should come as no big surprise. After all, it is consistent with 2009 findings by the National Academies of Science calling into question the reliability of all types of forensic science evidence, including supposedly more objective techniques such as DNA typing and fingerprint analysis.

Although the group's findings have heretofore been published only in academic journals and have found a limited audience outside of the profession, this might change. A Huffington Post blogger, Wray Herbert, has published a piece on the current findings, which he called "disturbing." And I predict more public interest if and when mainstream journalists and science writers learn of this extraordinary line of research.

In the latest study, Murrie and Boccaccini conducted follow-up analyses to determine how often matched pairs of experts differed in the expected direction. On the three cases in which clear allegiance effects showed up in PCL-R scoring, more than one-fourth of score pairings had differences of more than six points in the expected direction. Six points equates to about two standard errors of measurement (SEM's), which should  happen by chance in only 2 percent of cases. A similar, albeit milder, effect was found with the Static-99R.

Adversarial allegiance effects might be even stronger in less structured assessment contexts, the researchers warn. For example, clinical diagnoses and assessments of emotional injuries involve even more subjective judgment than scoring of the Static-99 or PCL-R.

But ... WHICH psychologists?!


For me, this study raised a tantalizing question: Since only some of the psychologists succumbed to the allegiance effect, what distinguished those who were swayed by the partisan pressures from those who were not?

The short answer is, "Who knows?"

The researchers told me that they ran all kinds of post-hoc analyses in an effort to answer this question, and could not find a smoking gun. As in a previous research project that I blogged about, they did find evidence for individual differences in scoring of the PCL-R, with some evaluators assigning higher scores than others across all cases. However, they found nothing about individual evaluators that would explain susceptibility to adversarial allegiance. Likewise, the allegiance effect could not be attributed to a handful of grossly biased experts in the mix.

In fact, although score differences tended to go in the expected direction -- with prosecution experts giving higher scores than defense experts on both instruments -- there was a lot of variation even among the experts on the same side, and plenty of overlap between experts on opposing sides.

So, on average prosecution experts scored the PCL-R about three points higher than did the defense experts. But the scores given by experts on any given case ranged widely even within the same group. For example, in one case, prosecution experts gave PCL-R scores ranging from about 12 to 35 (out of a total of 40 possible points), with a similarly wide range among defense experts, from about 17 to 34 points. There was quite a bit of variability on scoring of the Static-99R, too; on one of the four cases, scores ranged all the way from a low of two to a high of ten (the maximum score being 12).

When the researchers debriefed the participants themselves, they didn't have a clue as to what caused the effect. That's likely because bias is mostly unconscious, and people tend to recognize it in others but not in themselves. So, when asked about factors that make psychologists vulnerable to allegiance effects, the participants endorsed things that applied to others and not to them: Those who worked at state facilities thought private practitioners were more vulnerable; experienced evaluators thought that inexperience was the culprit. (It wasn't.)

I tend to think that greater training in how to avoid falling prey to cognitive biases (see my previous post exploring this) could make a difference. But this may be wrong; the experiment to test my hypothesis has not been run. 

The study is: "Are forensic experts biased by the side that retained them?" by Daniel C. Murrie, Marcus T. Boccaccini, Lucy A. Guarnera and Katrina Rufino, forthcoming from Psychological Science. Contact the first author (HERE) if you would like to be put on the list to receive a copy of the article as soon as it becomes available.

Click on these links for lists of my numerous prior blog posts on the PCL-R, adversarial allegiance, and other creative research by Murrie, Boccaccini and their prolific team. Among my all-time favorite experiments from this research team is: "Psychopathy: A Rorschach test for pychologists?"

June 10, 2010

Psychopathy controversy goes primetime

More than a million people worldwide will get a chance to learn about psychology's internal controversy over psychopathy tomorrow, when Science publishes an article on the censorship allegations that I blogged about May 30.

Perhaps not coincidentally, just as the June 11 issue of the world's leading scientific news outlet hits the presses, the American Psychological Association is suddenly publishing the disputed article that was siderailed for more than three years.

Forensic psychologists Jennifer Skeem and David Cooke submitted the contested article to Psychological Assessment in 2006. It was peer reviewed, accepted, and scheduled for publication in 2007, but was derailed after Robert Hare, inventor of the Psychopathy Checklist (PCL), threatened to sue for defamation.

As you will remember from my previous blog post, the controversy surfaced in an opinion piece last month in the International Journal of Forensic Mental Health by two psychology-law leaders.

"[T]he threat of litigation constitutes a serious threat to academic freedom and potentially to scientific progress," wrote attorney John Petrila and psychologist Norman Poythress. "Academic freedom rests on the premise that advances in science can only occur if scholars are permitted to pursue free competition among ideas. This assumes that scholars have the liberty to do their work free from limitations imposed by political or religious pressure or by economic reprisals."

Hare now says he is "upset colleagues are suggesting he squelched academic debate," Science writer John Tavris reports, as his "lawsuit threat was meant only to get the 'attention' of APA, Skeem, and Cooke and force changes to the article."

The Science report is a sidebar to a larger piece on reform efforts over plaintiff-friendly libel laws in the United Kingdom. That country's laws, in which the defendant bears the burden of proof, are under fire from around the world over their allegedly chilling effect on scientific research on controversial topics. Critics say they encourage "libel tourism," in which corporations sue there over alleged offenses that occurred elsewhere.

PCL-R reification hampering science

The contested article by Skeem and Cooke, "Is Criminal Behavior a Central Component of Psychopathy? Conceptual Directions for Resolving the Debate," posits that the field of forensic psychology has prematurely embraced Hare's Psychopathy Checklist-Revised (PCL-R) as the gold standard for psychopathy, due in large part to legal demands for a tool to predict violence. Yet the PCL-R's ability to predict violent recidivism owes in large part to its conflation of the supposed personality construct of psychopathy with past criminal behavior, they argue:
[T]he modern justice context has created a strong demand for identifying bad, dangerous people…. [The] link between the PCL and violence has supported a myth that emotionally detached psychopaths callously use violence to achieve control over and exploit others. As far as the PCL is concerned, this notion rests on virtually no empirical support…. [T]he process of understanding psychopathy must be separated from the enterprise of predicting violence.
Criminal behavior weighs heavily in the PCL's 20 items because the instrument emerged from research with prisoners. But using the PCL-R's consequent ability to predict violence to assert the theoretical validity of its underlying personality construct is a tautological, or circular, argument, claim Skeem and Cooke. Or, as John Ellard put it more directly back in 1998:
"Why has this man done these terrible things? Because he is a psychopath. And how do you know that he is a psychopath? Because he has done these terrible things."
Rebuttal and response

Alongside the critique, Psychological Assessment has published a rebuttal by Robert Hare and Craig Neumann, along with a surrebuttal by Cooke and Skeem. Hare and Neumann accuse the critics of erecting a straw-man argument and misrepresenting their work:
The very title of their article is a straw man based on the unfounded claim that Hare and his colleagues consider criminality to be central or fundamental to the psychopathy construct. Their claim is bolstered by arguments misconstruing our published work and that of others and by quotes of our work that have been taken out of context or reconstructed in such a way that it appears that we have said something that we did not say. Skeem and Cooke also made highly selective use of the literature, often omitting published studies that directly contradict or do not support the points they attempted to make, particularly with respect to the role of antisocial tendencies in clinical and empirical conceptions of psychopathy. These tactics are inconsistent with their tutorial on the philosophy of science, compromise their arguments, and divert attention from any legitimate issues raised in their article. We contend that Skeem and Cooke did the field a disservice by presenting an inaccurate account of the role of the PCL–R in theory and research on psychopathy, both applied and basic.
I encourage readers to analyze all three papers, along with the two reports in Science, and draw your own conclusions.

The current issue of Psychological Assessment contains another article pertaining to the controversial psychopathy construct. In their abstract of "Validity of Rorschach Inkblot scores for discriminating psychopaths from nonpsychopaths in forensic populations: A meta-analysis," authors James Wood, Scott Lilienfeld and colleagues assert:
Gacono and Meloy (2009) have concluded that the Rorschach Inkblot Test is a sensitive instrument with which to discriminate psychopaths from nonpsychopaths. We examined the association of psychopathy with 37 Rorschach variables in a meta-analytic review of 173 validity coefficients derived from 22 studies comprising 780 forensic participants…. The present findings contradict the view that the Rorschach is a clinically sensitive instrument for discriminating psychopaths from nonpsychopaths.

March 25, 2011

Psychopathy: A Rorschach test for psychologists?

  • Compassion
  • Empathy
  • Impulsivity
  • Excitement-seeking
What do these personality traits have in common?

If you are high on any or all of them, you may be less likely to rate other people as psychopathic on the Psychopathy Checklist (PCL-R).

The PCL-R is the most widely used measure of psychopathy in the world. But in real-world forensic settings, scores vary widely depending upon which side retained the evaluator. This finding is called the "partisan allegiance" effect.

In a new twist, these same researchers that brought you partisan allegiance have found that an evaluator's personality may impact her judgments of psychopathy. Evaluators low on compassion and thrill-seeking as measured by a widely used personality test, the NEO Personality Inventory-Revised, are more likely than others to rate criminals as psychopathic.

That’s ironic, because according to the theory of psychopathy, it's supposed to be the psychopath -- not the psychologist -- who has a deficit in empathy.

The exploratory study, forthcoming in the journal Assessment, was based on a small sample of 22 individuals who were given nine hours of training by a clinical psychologist with substantial research and forensic practice experience with the PCL-R. "The daylong session was an attempt to replicate typical PCL-R training procedures," the study authors explain.

The researchers emphasize that their findings are preliminary and need to be replicated and extended. But if they hold up, they have intriguing implications not only for the psychopathy measure but also for other psychological tests with elements of subjectivity in scoring or interpretation.

The study did not examine the accuracy of the low versus high scorers. But if low-scoring evaluators are more empathetic, this implies that they may be more accurate in interpersonal assessment contexts.  

Subterranean class conflict?

Future research might examine class background, race and philosophical  beliefs to see if these influence scoring of the Psychopathy Checklist. In my informal observations, professionals who look for psychopaths under every rock tend to lack understanding of, or empathy for, those on the bottom.

Here's how that looks in practice:

The upper middle-class professional walks into the evaluation room, oblivious to the blinders and unconscious biases she brings to the table. Her subject, in contrast, is far from oblivious. With his more acute empathetic skills, the lower-class or minority individual accurately reads the professional's bias against him, which she transmits through nonverbal and other deniable cues. He also realizes that she holds all the power, and that her judgments will affect his future in very tangible ways.

He reacts with defensiveness, suspicion, or muted hostility -- especially if she is working for "the other side." But not recognizing his reaction as part of an interactional dance that she herself set in motion, the evaluator interprets his stance as evidence of intrinsic personality defect. She may see him as glib, superficially charming, conning, or manipulative -- all facets of Factor 1 (the personality dimension) on the Psychopathy Checklist.

In this interaction, all of the power belongs to the person who gets to do the labeling. Scoring and labeling the offender becomes a circular process through which the evaluator -- especially when primed by adversarial allegiance -- can  project her own class- or race-based prejudices, distancing herself from the evil other, while at the same time denying complicity. An obfuscating tautology is proffered as a simple explanation for complex and multi-determined antisocial acts.

There is more to the construct of psychopathy, of course. I focus here on its potential subjectivity because this is a facet that proponents rarely acknowledge, especially in public. Forensic experts owe a duty to explain the subjectivity of the PCL-R when it is introduced in court, where the label "psychopath" can be the kiss of death. When labeled as psychopaths:
  • Juveniles are more likely to face harsh punishment in adult court.
  • Sex offenders are more likely to be preventively detained.
  • Capital murder defendants are more likely to receive the death penalty.
So, the next time a promising young student proposes to study psychopathy or "the criminal mind," you might give her a gentle nudge in a more fruitful direction: Rather than treading this tired old path, she might contribute more to the field by studying the psyches of professionals who assign such diagnostic labels in the first place. 

June 19, 2011

Violence risk meta-meta: Instrument choice does matter

Despite popularity, psychopathy test and actuarials not superior to other prediction methods 

The past couple of decades have seen an explosion of interest in forensic assessment of risk for future violent and sexual recidivism. Accordingly, evaluators can now choose from an array of more than 120 different risk assessment tools. But should this choice be based on individual preference, or are some instruments clearly superior to others?

Several systematic reviews and metaanalyses have addressed this question, but their conclusions often conflict. In the first systematic review of these reviews (called a “meta-review”), Jay Singh and Seena Fazel of Oxford University found that methodological shortcomings may contribute to the confusion. Problems they identified in the 40 metaanalyses and reviews they studied included authors' failure to adequately describe their study search procedures, failure to check for overlapping samples or publication bias, and failure to investigate the confound of sample heterogeneity.

The Oxford scholars, along with Martin Grann of Sweden's Centre for Violence Prevention, set out to rectify this problem via a more methodologically rigorous meta-review, using optimal data analyses and reporting procedures. For this purpose, they used the Preferred Reporting Items for Systematic Reviews and Metaanalyses, a 27-item checklist designed to enable a transparent and consistent reporting of results.

For their meta-meta (a metaanalysis of the metaanalyses), they collected data from 68 studies involving about 26,000 participants in 13 countries, focusing on the accuracy of the nine most commonly used forensic risk assessment instruments:
  • Psychopathy Checklist (PCL-R)
  • Static-99
  • Historical, Clinical, Risk Management-20 (HCR-20)
  • Violence Risk Appraisal Guide (VRAG)
  • Sexual Violence Risk-20 (SVR-20)
  • Level of Service Inventory (LSI-R)
  • Sex Offender Risk Appraisal Guide (SORAG)
  • Spousal Assault Risk Assessment (SARA)
  • Structured Assessment of Violence Risk in Youth (SAVRY)
Big differences in predictive validity

As it turns out, these widely used instruments vary substantially in predictive accuracy. Performing the best was the SAVRY, a risk assessment instrument designed for use with adolescents. At the bottom were the Level of Service Inventory and the Psychopathy Checklist. This is not too surprising, as the LSI-R is used with a wide variety of general offenders, and the PCL-R was not designed for risk prediction in the first place.



The present metaanalysis would therefore argue against the view of some experts that the PCL- R is unparalleled in its ability to predict future offending.

Statistical method matters: DOR outperforms AUC

The researchers compared several different methods of measuring predictive accuracy. They found that a popular statistic called the Area Under the Curve (AUC) was the weakest. Use of the AUC statistic may help to explain why some metaanalyses were unable to find significant differences among instruments, the authors theorize.

Better methods for comparing instruments’ predictive accuracy include calculating positive and negative predictive values and also using something called the Diagnostic Odds Ratio, or DOR. This is the ratio of the odds of a positive test result in an offender (true positive) relative to the odds of a positive result in a non-offender (false positive). The authors’ summary performance scores pooled results from all four statistical methods.

Actuarials not superior; race also matters

The poor performance of the Psychopathy Checklist (PCL-R) was not the only finding that may surprise some forensic evaluators. The researchers also found no evidence that actuarial tools – such as the widely touted Static-99 – outperform structured clinical judgment methods like the HCR-20 or the SVR-20.

They also found that an offender's race is critical to predictive accuracy. Risk assessment instruments perform best on white offenders, most likely because white offenders predominate in the underlying studies. This is consistent with other research, including a study by Dernevick and colleagues finding that risk assessment instruments are poor at predicting misconduct in terrorists.

Caution is therefore warranted when using any risk assessment tool to predict offending in samples dissimilar to their validation samples, the authors stress.

This systematic review appears to be the most methodologically rigorous such study to date, in a rapidly evolving field. I recommend obtaining both articles (see below for author contact information) and studying them carefully. The stakes are high, and it behooves us to use the instruments that are the most accurate for the specific purpose at hand.

The studies are: