IN THE NEWS: psychopathy

Showing posts with label psychopathy. Show all posts

February 15, 2020

Flawed science? Two efforts launched to improve scientific validity of psychological test evidence in court

There’s this forensic psychologist, we’ll call him Dr. Harms, who is infamous for his unorthodox approach. He scampers around the country deploying a bizarre admixture of obscure, outdated and unpublished tests that no one else has ever heard of.

Oh, and the Psychopathy Checklist (PCL-R). Dr. Harms never omits that. To him, everyone is a chillingly dangerous psychopath. Even a 30-year-old whose last crime was at age 15.

What’s most bizarre about Dr. Harms’s esoteric method is that he gets away with it. Attorneys may try to challenge him in court, but their protests usually fall flat. Judges rule that any weaknesses in his method should go to the “weight” that jurors give Dr. Harm’s opinions, rather than the admissibility of his tests.

Psychological tests hold a magical allure as objective truth. They retain their luster even while forensic science techniques previously regarded as bulletproof are undergoing unprecedented scrutiny. Based in large part on our briefcases full of tests, courts have granted psychologists unprecedented influence over an ever-increasing array of thorny issues, from future dangerousness to parental fitness to refugee trauma. Behind the scenes, meanwhile, a lucrative test-production industry is gleefully rubbing its hands all the way to the bank.

In other forensic “science” niches such as bite-mark analysis and similar types of pattern matching that have contributed to wrongful convictions, appellate attorneys have had to wage grueling, decades-long efforts to reign in shoddy practice. (See Radley Balko's The Cadaver King and the Country Dentist for more on this.) But leaders in the field of forensic psychology are grabbing the bull by the horns and inviting us to do better, proposing novel ways for us to self-police.

New report slams "junk science” psychological assessments

In one of two significant developments, a group of researchers today released evidence of systematic problems with the state of psychological test admissibility in court. The researchers' comprehensive survey found that only about two-thirds of the tools used by clinicians in forensic settings were generally accepted in the field, while even fewer -- only about four in ten -- were favorably reviewed in authoritative sources such as the Mental Measurements Yearbook.

Despite this, psychological tests are rarely challenged when they are introduced in court, Tess M.S. Neal and her colleagues found. Even when they are, the challenges fail about two-thirds of the time. Worse yet, there is little relationship between a tool’s psychometric quality and the likelihood of it being challenged.

Slick ad for one of a myriad of new psych tests.

“Some of the weakest tools tend to get a pass from the courts,” write the authors of the newly issued report, "Psychological Assessments in Legal Contexts: Are Courts Keeping 'Junk Science' Out of the Courtroom?”

The report, currently in press in the journal Psychological Science in the Public Interest, proposes that standard batteries be developed for forensic use, based on the consensus of experts in the field as to which tests are the most reliable and valid for assessing a given psycholegal issue. It further cautions against forensic deployment of newly developed tests that are being marketed by for-profit corporations before adequate research or review by independent professionals.

"Life or death" call to halt prejudicial use of psychopathy test

In a parallel development in the field, 13 prominent forensic psychologists have issued a rare public rebuke of improper use of the controversial Psychopathy Checklist (PCL-R) in court. The group is calling for a halt to the use of the PCL-R in the sentencing phase of death-penalty cases as evidence that a convicted killer will be especially dangerous if sentenced to life in prison rather than death.

As I’ve reported previously in a series of posts (here and here, for example), scores on the PCL-R swing wildly in forensic settings based on which side hired the expert. In a phenomenon known as adversarial allegiance, prosecution-retained experts produce scores in the high-psychopathy range in about half of cases, as compared with less than one out of ten cases for defense experts.

Research does not support testimony being given by prosecution experts in capital trials that PCL-R scores can accurately predict serious violence in institutional settings such as prison, according to the newly formed Group of Concerned Forensic Mental Health Professionals. And once such a claim is made in court, its prejudicial impact on jurors is hard to overcome, potentially leading to a vote for execution.

The "Statement of Concerned Experts," whose authors include prominent professionals who helped to develop and test the PCL-R, is forthcoming from the respected journal Psychology, Public Policy, and Law.

Beware the all-powerful law of unintended consequences

This scrutiny of how psychological instruments are being used in forensic practice is much needed and long overdue. Perhaps eventually it may even trickle down to our friend Dr. Harms, although I have a feeling it won't be before his retirement.

But never underestimate the law of unintended consequences.

The research group that surveyed psychological test use in the courts developed a complex, seemingly objective method to sort tests according to whether they were generally accepted in the field and/or favorably reviewed by independent researchers and test reviewers.

Ironically enough, one of the tests that they categorized as meeting both criteria – general acceptance and favorable review – was the PCL-R, the same test being targeted by the other consortium for its improper deployment and prejudicial impact in court. (Perhaps not so coincidentally, that test is a favorite of the aforementioned Dr. Harms, who likes to score it high.)

The disconnect illustrates the fact that science doesn’t exist in a vacuum. Psychopathy is a value-laden construct that owes its popularity in large part to current cultural values, which favor the individual-pathology model of criminal conduct over notions of rehabilitation and desistance from crime.

It’s certainly understandable why reformers would suggest the development of “standard batteries … based on the best clinical tools available.” The problem comes in deciding what is “best.”

Who will be privileged to make those choices (which will inevitably reify the dominant orthodoxy and its implicit assumptions)?

What alternatives will those choices exclude? And at whose expense?

And will that truly result in fairer and more scientifically defensible practice in the courtroom?

It’s exciting that forensic psychology leaders are drawing attention to the dark underbelly of psychological test deployment in forensic practice. But despite our best efforts, I fear that equitable solutions may remain thorny and elusive.

Adversarial allegiance: Frontier of forensic psychology research

A colleague recently commented on how favorably impressed he was about the open-mindedness of two other forensic examiners, who had had the courage to change their opinions in the face of new evidence. The two had initially recommended that a man be civilly committed as a sexually violent predator, but changed their minds three years later .

My colleague's admiration was short-lived. It evaporated when he realized that the experts’ change of heart had come only after they switched teams: Initially retained by the government, they were now in the employ of the defense.

"Adversarial allegiance" is the name of this well-known phenomenon in which some experts' opinions tend to drift toward the party retaining their services. This bias is insidious because it operates largely outside of conscious awareness, and can affect even ostensibly objective procedures such as the scoring and interpretation of standardized psychological tests.

Partisan bias is nothing new to legal observers, but formal research on its workings is in its infancy. Now, the researchers spearheading the exploration of this intriguing topic have put together a summary review of the empirical evidence they have developed over the course of the past decade. The review, by Daniel Murrie of the Institute of Law, Psychiatry and Public Policy at the University of Virginia and Marcus Boccaccini of Sam Houston State University, is forthcoming in the Annual Review of Law and Social Science.

Forensic psychologists’ growing reliance on structured assessment instruments gave Murrie and Boccaccini a way to systematically explore partisan bias. Because many forensic assessment tools boast excellent interrater reliability in the laboratory, the team could quantify the degradation of fidelity that occurs in real-world settings. And when scoring trends correlate systematically with which side the evaluator is testifying for, adversarial allegiance is a plausible culprit.

Daniel Murrie

Such bias has been especially pronounced with the Psychopathy Checklist-Revised, which is increasingly deployed as a weapon by prosecutors in cases involving future risk, such as capital murder sentencing hearings, juvenile transfer to adult courts, and sexually violent predator commitment trials. In a series of ground-breaking experiments, the Murrie-Boccaccini team found that scores on the PCL-R vary hugely and systematically based on whether an expert is retained by the prosecution or the defense, with the differences often exceeding what is statistically plausible based on chance.

Systematic bias was also found in the scoring of two measures designed to predict future sexual offending, the popular Static-99 and the now-defunct Minnesota Sex Offender Screening Tool Revised (MnSOST-R).

One shortcoming of the team’s initial observational research was that it couldn’t eliminate the possibility that savvy attorneys preselected who were predisposed toward one side or the other. To test this possibility, two years ago the team designed a devious experimental study in which they recruited forensic psychologists and psychiatrists and randomly assigned them to either a prosecution or defense legal unit. To increase validity, the experts were even paid $400 a day for their services.

Marcus Boccaccini

The findings provided proof-positive of the strength of the adversarial allegiance effect. Forensic experts assigned to the bogus prosecution unit gave higher scores on both the PCL-R and the Static-99R than did those assigned to the defense. The pattern was especially pronounced on the PCL-R, due to the subjectivity of many of its items. ("Glibness" and "superficiality," for example, cannot be objectively measured.)

The research brought further bad tidings. Even when experts assign the same score on the relatively simple Static-99R instrument, they often present these scores in such a way as to exaggerate or downplay risk, depending on which side they are on. Specifically, prosecution-retained experts are far more likely to endorse use of "high-risk" norms that significantly elevate risk.

Several somewhat complimentary theories have been advanced to explain why adversarial allegiance occurs. Prominent forensic psychologist Stanley Brodsky has attributed it to the social psychological process of in-group allegiance. Forensic psychologists Tess Neal and Tom Grisso have favored a more cognitive explanation, positing heuristic biases such as the human tendency to favor confirmatory over disconfirmatory information. More cynically, others have attributed partisan bias to conscious machinations in the service of earning more money. Murrie and Boccaccini remain agnostic, saying that all of these factors could play a role, depending upon the evaluator and the situation.

One glimmer of hope is that the allegiance effect is not universal. The research team found that only some of the forensic experts they studied are swayed by which side retains them. Hopefully, the burgeoning interest in adversarial allegiance will lead to future research exploring not only the individual and situational factors that trigger bias, but also what keeps some experts from shading their opinions toward the retaining party.

Even better would be if the courts took an active interest in this problem of bias. Some Australian courts, for example, have introduced a method called "hot tubs" in which experts for all of the sides must come together and hash out their differences outside of court.

In the meantime, watch out if someone tries to recruit you at $400 a day to come and work for a newly formed legal unit. It might be another ruse, designed to see how you hold up to adversarial pressure.

* * * * *

The article is: Adversarial Allegiance among Expert Witnesses, forthcoming from The Annual Review of Law and Social Science. To request it from the first author, click HERE.

September 4, 2014

More studies finding bias in PCL-R measurement of psychopathy

I've been reporting for quite some time about problems with the reliability and validity of the Psychopathy Checklist (PCL-R), a popular instrument for measuring psychopathy in forensic settings. It is a critical issue in forensic psychology, because of the massively prejudicial nature of the term "psychopath." Once a judge or jury hears that term, pretty much everything else sounds like "blah blah blah."

Now, the journal Law and Human Behavior has published two new studies -- one from the U.S. and the other from Sweden -- adding to the ever-more-persuasive line of research on PCL-R rater bias. It's high time for a critical examination of whether the PCL-R belongs in court, but I doubt that will happen anytime soon because of its efficacy for obtaining desired results. At the bottom of each abstract, I've provided contact information so that you can request the full articles from the authors.

* * * * *

Field Reliability of the Psychopathy Checklist-Revised Among Life Sentenced Prisoners in Sweden

Joakim Sturup, John F. Edens, Karolina Sörman, Daniel Karlberg, Björn Fredriksson and Marianne Kristiansson Law and Human Behavior 2014, Vol. 38, No. 4, 315-324

ABSTRACT: Although typically described as reliable and valid, the Psychopathy Checklist-Revised (PCL-R) has come under some criticism by researchers in the last half-decade due to evidence of poor interrater reliability and adversarial allegiance being reported in applied settings in North America. This study examines the field reliability of the PCL-R using a naturalistic test–retest design among a sample of Swedish life sentenced prisoners (N 27) who had repeatedly been assessed as part of their application to receive a reduced prison term. The prisoners, who were assessed by a team of forensic evaluators retained by an independent government authority, had spent on average 14 years in prison with a mean time from Assessment 1 to Assessment 2 of 2.33 years. The overall reliability of the PCL-R (ICCA1) was .70 for the total score and .62 and .76 for Factor 1 and 2 scores, respectively. Facet 1–3 scores ranged from .54 to .60, whereas Facet 4 was much higher (.90). Reliability of individual items was quite variable, ranging from .23 to .80. In terms of potential causes of unreliability, both high and low PCL-R scores at the initial assessment tended to regress toward the mean at the time of the second evaluation. Our results are in line with previous research demonstrating concerns regarding the reliability of the PCL-R within judicial settings, even among independent evaluation teams not retained by a particular side in a case. Collectively, these findings question whether the interpersonal (Facet 1) and affective (Facet 2) features tapped by the PCL-R are reliable enough to justify their use in legal proceedings.

Request a copy from the author.

* * * * *

Evaluator Differences in Psychopathy Checklist-Revised Factor and Facet Scores

Marcus T. Boccaccini, Daniel C. Murrie, Katrina A. Rufino and Brett O. Gardner Law and Human Behavior 2014, Vol. 38, No. 4, 337-345

ABSTRACT: Recent research suggests that the reliability of some measures used in forensic assessments—such as Hare’s (2003) Psychopathy Checklist-Revised (PCL-R)—tends to be weaker when applied in the field, as compared with formal research studies. Specifically, some of the score variability in the field is attributable to evaluators themselves, rather than the offenders they evaluate. We studied evaluator differences in PCL-R scoring among 558 offenders (14 evaluators) and found evidence of large evaluator differences in scoring for each PCL-R factor and facet, even after controlling for offenders’ self-reported antisocial traits. There was less evidence of evaluator differences when we limited analyses to the 11 evaluators who reported having completed a PCL-R training workshop. Findings provide indirect but positive support for the benefits of PCL-R training, but also suggest that evaluator differences may be evident to some extent in many field settings, even among trained evaluators.

Request from author.

More of my coverage of the PCL-R is available HERE. An NPR series on the controversy -- including an essay by me -- is HERE.

Hat tip: Brian Abbott

January 5, 2014

New evidence of psychopathy test's poor accuracy in court

Use of a controversial psychopathy test is skyrocketing in court, even as mounting evidence suggests that the prejudicial instrument is highly inaccurate in adversarial settings.

The latest study, published by six respected researchers in the influential journal Law and Human Behavior, explored the accuracy of the Psychopathy Checklist, or PCL-R, in Sexually Violent Predator cases around the United States.

The findings of poor reliability echo those of other recent studies in the United States, Canada and Europe, potentially heralding more admissibility challenges in court.

Although the PCL-R is used in capital cases, parole hearings and juvenile sentencing, by far its most widespread forensic use in the United States is in Sexually Violent Predator (SVP) cases, where it is primarily invoked by prosecution experts to argue that a person is at high risk for re-offense. Building on previous research, David DeMatteo of Drexel University and colleagues surveyed U.S. case law from 2005-2011 and located 214 cases from 19 states -- with California, Texas and Minnesota accounting for more than half of the total -- that documented use of the PCL-R in such proceedings.

To determine the reliability of the instrument, the researchers examined a subset of 29 cases in which the scores of multiple evaluators were reported. On average, scores reported by prosecution experts were about five points higher than those reported by defense-retained experts. This is a large and statistically significant difference that cannot be explained by chance.

Prosecution experts were far more likely to give scores of 30 or above, the cutoff for presumed psychopathy. Prosecution experts reported scores of 30 or above in almost half of the cases, whereas defense witnesses reported scores that high in less than 10 percent.

Looking at interrater reliability another way, the researchers applied a classification scheme from the PCL-R manual in which scores are divided into five discreet categories, from “very low” (0-8) to “very high” (33-40). In almost half of the cases, the scores given by two evaluators fell into different categories; in about one out of five cases the scores were an astonishing two or more categories apart (e.g., “very high” versus “moderate” psychopathy).

Surprisingly, interrater agreement was even worse among evaluators retained by the same side than among opposing experts, suggesting that the instrument’s inaccuracy is not solely due to what has been dubbed adversarial (or partisan) allegiance.

Despite its poor accuracy, the PCL-R is extremely influential in legal decision-making. The concept of psychopathy is superficially compelling in our current era of mass incarceration, and the instrument's popularity shows no sign of waning.

Earlier this year, forensic psychologist Laura Guy and colleagues reported on its power in parole decision-making in California. The state now requires government evaluators to use the PCL-R in parole fitness evaluations for “lifers,” or prisoners sentenced to indeterminate terms of up to life in prison. Surveying several thousand cases, the researchers found that PCL-R scores were a strong predictor of release decisions by the Parole Board, with those granted parole scoring an average of about five points lower than those denied for parole. Having just conducted one such evaluation, I was struck by the frightening fact – alluded to by DeMatteo and colleagues -- that the chance assignment of an evaluator who typically gives high scores on the PCL-R “might quite literally mean the difference between an offender remaining in prison versus being released back into the community.”

Previous research has established that Factor 1 of the two-factor instrument – the factor measuring characterological traits such as manipulativeness, glibness and superficial charm – is especially prone to error in forensic settings. This is not surprising, as traits such as “glibness” are somewhat in the eye of the beholder and not objectively measurable. Yet, the authors assert, “it is exactly these traits that seem to have the most impact” on judges and juries.

Apart from the issue of poor reliability, the authors questioned the widespread use of the PCL-R as evidence of impaired volitional control, an element required for civil commitment in SVP cases. They labeled as “ironic, if not downright contradictory” the fact that psychopathy is often touted in traditional criminal responsibility (or insanity) cases as evidence of badness as opposed to mental illness, yet in SVP cases it magically transforms into evidence of a major mental disorder that interferes with self-control.

The evidence is in: The Psychopathy Checklist-Revised is too inaccurate in applied settings to be relied upon in legal decision-making. With consistent findings of abysmal interrater reliability, its prejudicial impact clearly outweighs any probative value. However, the gatekeepers are not guarding the gates. So long as judges and attorneys ignore this growing body of empirical research, prejudicial opinions will continue to be cloaked in a false veneer of science, contributing to unjust outcomes.

* * * * *

The study is:

The Role and Reliability of the Psychopathy Checklist-Revised in U.S. Sexually Violent Predator Evaluations: A Case Law Survey by DeMatteo, D., Edens, J. F., Galloway, M., Cox, J., Toney Smith, S. and Formon, D. (2013). Law and Human Behavior

Copies may be requested from the first author (HERE).

The same research team has just published a parallel study in Psychology, Public Policy and Law:

“Investigating the Role of the Psychopathy Checklist-Revised in United States Case Law” by DeMatteo, David; Edens, John F.; Galloway, Meghann; Cox, Jennifer; Smith, Shannon Toney; Koller, Julie Present; Bersoff, Benjamin

My related essays and blog posts (I especially recommend the three marked with asterisks):

NPR expert panel: Weighing the value of a test for psychopaths ***

Study: Risk tools don't work with psychopaths (Oct. 8, 2013)

Remarkable experiment proves pull of adversarial allegiance (March 5, 2013) ***

Study: Psychopathy score fails to predict sexual recidivism (Oct. 30, 2011)

Psychopathy: A Rorschach test for psychologists? (March 25, 2011) ***

Metaanalysis debunks psychopathy-violence link (Sept. 3, 2010)

Psychopathy controversy goes prime time (June 10, 2010)

PCL-R inventor wringing his hands over forensic misuse (May 26, 2011)

Psychopathy brouhaha: It's a wrap (June 17, 2010)

October 8, 2013

Study: Risk tools don't work with psychopaths

If you want to know whether that psychopathic fellow sitting across the table from you will commit a violent crime within the next three years, you might as well flip a coin as use a violence risk assessment tool.

Popular risk assessment instruments such as the HCR-20 and the VRAG perform no better than chance in predicting risk among prisoners high in psychopathy, according to a new study published in the British Journal of Psychiatry. The study followed a large, high-risk sample of released male prisoners in England and Wales.

Risk assessment tools performed fairly well for men with no mental disorder. Utility was decreased for men diagnosed with schizophrenia or depression, became worse yet for those with substance abuse, and ranged from poor to no better than chance for individuals with personality disorders. But the instruments bombed completely when it came to men with high scores on the Psychopathy Checklist-Revised (PCL-R) (which, as regular readers of this blog know, has real-world validity problems all its own).

"Our findings have major implications for risk assessment in criminal populations," noted study authors Jeremy Coid, Simone Ullrich and Constantinos Kallis. "Routine use of these risk assessment instruments will have major limitations in settings with high prevalence of severe personality disorder, such as secure psychiatric hospitals and prisons."

The study, "Predicting future violence among individuals with psychopathy," may be requested from the first author, Jeremy Coid (click HERE).

August 25, 2013

Forensnips aplenty, forensnips galore

Everybody knows that the war is over
Everybody knows the good guys lost
Everybody knows the fight was fixed
The poor stay poor, the rich get rich
That's how it goes, Everybody knows

I can't seem to get Leonard Cohen’s haunting Everybody Knows out of my mind.

Perhaps it's because I was just down in Alabama, the belly of the beast, working on a tragic case. With the highest per capita rate of executions in the United States, the Heart of Dixie State kills people for crimes that other nations punish with probation. No exaggeration. It was jarring to drive around Montomery and see the close proximity of historic mansions to abandoned homes and decaying housing projects. The juxtaposition is fitting, as Montgomery claims the dual distinctions of being the "cradle of the Confederacy" and the "birthplace of the Civil Rights Movement."

Montgomery, Alabama (c) Karen Franklin 2013

Or maybe it's a flashback to Elysium, in which the one percenters have left Earth’s teeming masses to rot away while they luxuriate on an idyllic orbiting satellite. The scene in the parole office, with a robot parole agent delivering a quick risk assessment and then pushing meds, is worth the price of admission, although the film is marred by interminable hand-to-hand combat scenes and a ridiculous Hollywood ending.


David Miranda, held hostage by British security forces

Or, it could be because I’m still riled up over the British government's abuse of David Miranda. He is the Brazilian partner of journalist Glenn Greenwald (think Edward Snowden). In what can only be called an outrageous effort to intimidate journalists, the Brits detained Miranda at Heathrow Airport for nine solid hours -- the maximum allowed under the British Terrorism Act -- before finally releasing him sans his laptop, cell phone and camera. Under the Terrorism Act, he was not entitled to counsel, nor to decline to cooperate. I sure hope it backfires and incenses journalists; it certainly fired up USA Today columnist Rem Rieder (whose column I highly recommend).

* * * * *

I feel bad about the dearth of posts recently. It's been a hectic period. I'll try to make up for my lapse by packing this post with lots of links to forensic psychology and criminology news and views from the past few weeks:

Evidence-based justice: Corrupted memory

Nature magazine's profile of Elizabeth Loftus and her decades-long crusade to expose flaws in eyewitness testimony is worth a gander.

Psychopathic criminals have empathy switch

New research published in the journal Brain indicates that psychopaths do not lack empathy, as is often claimed. Rather, they can switch it on and off at will. The study, out of the Netherlands Institute for Neuroscience, is freely available online. BBC also has coverage.

The demographics of sexting

Sexting is becoming increasingly commonplace. But practices and meanings differ by gender, relationship and sexual identity, according to a new article, also available online, in the journal Cyberpsychology, Behavior, and Social Networking.

Brainwashed video discussion

New York Times columnist David Brooks just interviewed psychiatrist Sally Satel and psychologist Scott Lilenfield about their new book, Brainwashed, which is getting quite a bit of media buzz. The book is a workmanlike, if a bit superficial, exploration of the allure of "mindless neuroscience." If you’ve got 65 minutes, I recommend watching the video discussion.

Prison news: Hunger strike, juveniles, the elderly, women

On the prison front, a lot has been going on. California prisoners are into Day 50 or so of their hunger strike over solitary housing (a condition that the Department of Corrections denies, despite many men being kept in segregation units for years and even decades) and other cruel conditions. With prisoners' health deteriorating, a court order has been issued allowing force feeding if necessary to forestall deaths. Mainstream media reporting has been minimal, but at least Al Jazeera's got you covered.

Even more local to me, a lawsuit has been filed over solitary confinement of juveniles in Contra Costa County. The lawsuit, filed in U.S. District Court, accuses county officials of flouting state laws mandating that juvenile detention facilities be supportive environments designed for rehabilitation.

Meanwhile, NBC news is sounding an alarm over the increasing number of elderly people in U.S. prisons. NBC sounds mostly worried about the cost to taxpayers of prisons teeming with upwards of 400,000 elderly prisoners by the year 2030. Read ithttp://usnews.nbcnews.com/_news/2013/06/29/.UeV62HppQL8.twitter, and weep.

Piper Kerman, author of the memoir Orange Is the New Black that's become a trendy Netflix series, is also sounding an alarm. In a New York Times op-ed, she writes about a federal plan to ease overcrowding in men's prisons by shipping about 1,000 women from Connecticut down to Alabama and points beyond, where they will be even more estranged from their families. As Kerman notes: "For many families these new locations might as well be the moon." I recommend her thoughtful essay on alternatives for low-risk women prisoners.

In a more promising development, the U.S. Justice Department has announced efforts to curtail the stiff drug sentences that have caused much of this overcrowding in the first place. The U.S. prison system is so bloated, so costly, and so irrational, that even conservatives are calling for reform. Better late than never, I suppose.

By the way, Florida has executed John Errol Ferguson, the prisoner whose controversial case I blogged about earlier this year, whose competency was contested in part because of his insistence that he was the "Prince of God." The American Bar Association had filed an amicus brief asking the U.S. Supreme Court to clarify the standard for competency for execution being applied in the case.

Sex offender news

In yet another in a series of registry-facilitated vigilante attacks, a South Carolina man has been arrested for killing a sex offender and his wife in the mistaken belief that the man was a child molester. At the same time, there are signs that overzealous laws that contribute to such stigmatization are being scrutinized more closely. For example, a federal judge has struck down a Colorado city's ordinance restricting where registered sex offenders can live, ruling that it conflicts with a state law requiring parolees to be reintegrated into society. An appellate panel in North Carolina has also struck down a law that banned registered sex offenders from using social media sites. The state Court of Appeals agreed with the challenger that the law violated his Constitutional rights to free speech and freedom of association.

Dispute over expert witness credentials

Finally, there's a big brouhaha in South Dakota over the credentials of a psychologist who frequently testifies as an expert witness in child custody cases. The credentials of the widely respected psychologist, Thomas Price, became an issue during a child custody dispute. It was ascertained that he had earned his PhD in behavioral medicine from an online degree mill called Greenwich University on Norfolk Island, Australia, that was subsequently shuttered by the Australian government. According to an expert on diploma mills quoted by the Sioux Falls Argus Leader, degree mills often adopt the names of respected English universities. Price's resumé says he earned a Ph.D. in behavioral medicine from Greenwich University, without noting the Norfolk Island location. "Typically," notes the article, "people don’t get caught using an unaccredited degree until they assume a high-profile position ... or they do something that causes another person to research their backgrounds…. If you stay under the radar, you can get by."

Science blogger

Finally (this time I really mean it), for those of you who are into offbeat science, I've just added a new blog, Mike the Mad Biologist, to my blog roll (which can be found a little ways down the right column of my blog site). Mike is prolific and wide-ranging in his news links, with a creative spin.

Hat tips to Jane, Terry, Kirk and others

March 5, 2013

Remarkable experiment proves pull of adversarial allegiance

Psychologists' scoring of forensic tools depends on which side they believe has hired them

A brilliant experiment has proven that adversarial pressures skew forensic psychologists' scoring of supposedly objective risk assessment tests, and that this "adversarial allegiance" is not due to selection bias, or preexisting differences among evaluators.

The researchers duped about 100 experienced forensic psychologists into believing they were part of a large-scale forensic case consultation at the behest of either a public defender service or a specialized prosecution unit. After two days of formal training by recognized experts on two widely used forensic instruments -- the Psychopathy Checklist-R (PCL-R) and the Static-99R -- the psychologists were paid $400 to spend a third day reviewing cases and scoring subjects. The National Science Foundation picked up the $40,000 tab.

Unbeknownst to them, the psychologists were all looking at the same set of four cases. But they were "primed" to consider the case from either a defense or prosecution point of view by a research confederate, an actual attorney who pretended to work on a Sexually Violent Predator (SVP) unit. In his defense attorney guise, the confederate made mildly partisan but realistic statements such as "We try to help the court understand that ... not every sex offender really poses a high risk of reoffending." In his prosecutor role, he said, "We try to help the court understand that the offenders we bring to trial are a select group [who] are more likely than other sex offenders to reoffend." In both conditions, he hinted at future work opportunities if the consultation went well.

The deception was so cunning that only four astute participants smelled a rat; their data were discarded.

As expected, the adversarial allegiance effect was stronger for the PCL-R, which is more subjectively scored. (Evaluators must decide, for example, whether a subject is "glib" or "superficially charming.") Scoring differences on the Static-99R only reached statistical significance in one out of the four cases.

The groundbreaking research, to be published in the journal Psychological Science, echoes previous findings by the same group regarding partisan bias in actual court cases. But by conducting a true experiment in which participants were randomly assigned to either a defense or prosecution condition, the researchers could rule out selection bias as a cause. In other words, the adversarial allegiance bias cannot be solely due to attorneys shopping around for simpatico experts, as the experimental participants were randomly assigned and had no group differences in their attitudes about civil commitment laws for sex offenders.

Sexually Violent Predator cases are an excellent arena for studying adversarial allegiance, because the typical case boils down to a "battle of the experts." Often, the only witnesses are psychologists, all of whom have reviewed essentially the same material but have differing interpretations about mental disorder and risk. In actual cases, the researchers note, the adversarial pressures are far higher than in this experiment:

"This evidence of allegiance was particularly striking because our experimental manipulation was less powerful than experts are likely to encounter in most real cases. For example, our participating experts spent only 15 minutes with the retaining attorney, whereas experts in the field may have extensive contact with retaining attorneys over weeks or months. Our experts formed opinions based on files only, which were identical across opposing experts. But experts in the field may elicit different information by seeking different collateral sources or interviewing offenders in different ways. Therefore, the pull toward allegiance in this study was relatively weak compared to the pull typical of most cases in the field. So the large group differences provide compelling evidence for adversarial allegiance."

This is just the latest in a series of stunning findings by this team of psychologists led by Daniel Murrie of the University of Virginia and Marcus Boccaccini of Sam Houston University on an allegiance bias among psychologists. The tendency of experts to skew data to fit the side who retains them should come as no big surprise. After all, it is consistent with 2009 findings by the National Academies of Science calling into question the reliability of all types of forensic science evidence, including supposedly more objective techniques such as DNA typing and fingerprint analysis.

Although the group's findings have heretofore been published only in academic journals and have found a limited audience outside of the profession, this might change. A Huffington Post blogger, Wray Herbert, has published a piece on the current findings, which he called "disturbing." And I predict more public interest if and when mainstream journalists and science writers learn of this extraordinary line of research.

In the latest study, Murrie and Boccaccini conducted follow-up analyses to determine how often matched pairs of experts differed in the expected direction. On the three cases in which clear allegiance effects showed up in PCL-R scoring, more than one-fourth of score pairings had differences of more than six points in the expected direction. Six points equates to about two standard errors of measurement (SEM's), which should happen by chance in only 2 percent of cases. A similar, albeit milder, effect was found with the Static-99R.

Adversarial allegiance effects might be even stronger in less structured assessment contexts, the researchers warn. For example, clinical diagnoses and assessments of emotional injuries involve even more subjective judgment than scoring of the Static-99 or PCL-R.

But ... WHICH psychologists?!

For me, this study raised a tantalizing question: Since only some of the psychologists succumbed to the allegiance effect, what distinguished those who were swayed by the partisan pressures from those who were not?

The short answer is, "Who knows?"

The researchers told me that they ran all kinds of post-hoc analyses in an effort to answer this question, and could not find a smoking gun. As in a previous research project that I blogged about, they did find evidence for individual differences in scoring of the PCL-R, with some evaluators assigning higher scores than others across all cases. However, they found nothing about individual evaluators that would explain susceptibility to adversarial allegiance. Likewise, the allegiance effect could not be attributed to a handful of grossly biased experts in the mix.

In fact, although score differences tended to go in the expected direction -- with prosecution experts giving higher scores than defense experts on both instruments -- there was a lot of variation even among the experts on the same side, and plenty of overlap between experts on opposing sides.

So, on average prosecution experts scored the PCL-R about three points higher than did the defense experts. But the scores given by experts on any given case ranged widely even within the same group. For example, in one case, prosecution experts gave PCL-R scores ranging from about 12 to 35 (out of a total of 40 possible points), with a similarly wide range among defense experts, from about 17 to 34 points. There was quite a bit of variability on scoring of the Static-99R, too; on one of the four cases, scores ranged all the way from a low of two to a high of ten (the maximum score being 12).

When the researchers debriefed the participants themselves, they didn't have a clue as to what caused the effect. That's likely because bias is mostly unconscious, and people tend to recognize it in others but not in themselves. So, when asked about factors that make psychologists vulnerable to allegiance effects, the participants endorsed things that applied to others and not to them: Those who worked at state facilities thought private practitioners were more vulnerable; experienced evaluators thought that inexperience was the culprit. (It wasn't.)

I tend to think that greater training in how to avoid falling prey to cognitive biases (see my previous post exploring this) could make a difference. But this may be wrong; the experiment to test my hypothesis has not been run.

The study is: "Are forensic experts biased by the side that retained them?" by Daniel C. Murrie, Marcus T. Boccaccini, Lucy A. Guarnera and Katrina Rufino, forthcoming from Psychological Science. Contact the first author (HERE) if you would like to be put on the list to receive a copy of the article as soon as it becomes available.

Click on these links for lists of my numerous prior blog posts on the PCL-R, adversarial allegiance, and other creative research by Murrie, Boccaccini and their prolific team. Among my all-time favorite experiments from this research team is: "Psychopathy: A Rorschach test for pychologists?"

February 11, 2013

Exercise: Priming students to detect covert biases

In an eye-opening exercise in my graduate forensic psychology course, I had two groups separately analyze a sanitized forensic report. The subject of the report was a 16-year-old boy named "John" who had committed a relatively minor sex offense; the evaluation issue was treatment amenability. After independent group discussions, the two groups shared their impressions as follows:

Group A: "John has a conduct disorder and is narcissistic. His misconduct appears to be escalating. There are ominous warning signs of budding psychopathy. He is at a crossroads in his life; he could go bad fast."

Group B: "This report is biased. The evaluator has joined with John's mother, and is channeling the mother's antagonism toward John. There is evidence of racism, homophobia, and political conservatism. The evaluator’s antipathy toward John feels personal – perhaps he has a wayward teenage son?"

The two groups looked across the table at each other, flabbergasted. Some suspected a trick. "Did you really give us the same report to read?" one student queried.

Yes, everyone had read the identical report. And, in case you wondered, group selection was random; there were no baseline differences that would explain the groups' divergent opinions.

Rather, the difference was in how the two groups were primed to read the report. Their instructions:

Group A: "Read the report with the goal of trying to understand John. What makes him tick? Does he have any potential clinical diagnoses? What is your prognosis for his future?"

Group B: "Read the report with the goal of trying to understand the perspective of the report writer. Do you see any problems with his method or his analysis? If so, do they suggest any potential biases?"

This was no abstract academic exercise. Channeling John’s hateful mother, this seminal report reads like something torn from the pages of an Anne Rule novel, replete with enough (uncorroborated) animal torture and arson to excite any true believer in the infamous McDonald Triad. Going unchallenged at the time, the report had a hugely prejudicial impact on decision-makers. For years to come, institutional bureaucrats and forensic experts quoted liberally from it to bolster their opinions that John was dangerous.

This is not an isolated or unusual case. Alarmist reports like this have remarkable staying power, their uncorroborated claims taking on a life of their own as they ripple through their subjects' lives, eschewing rational analysis or contestation. The power of a single forensic evaluator is truly frightening at times.

Cutting through the hype

So how did a group of graduate students manage to see through the hype that had buffaloed seasoned professionals, to take the measure of the evaluator and expose his subterranean biases? Remarkably, all it took was a simple admonition to think critically, and to be alert to potential biases.

Ideally, we should always be exercising these analytical faculties. We should train ourselves to simultaneously process at least two units of analyses, asking ourselves both:

A. What does this report tell us about its subject?

B. What are the limitations of this report? How might its findings be unreliable, and perhaps flawed by unreliable or insufficient information, unconscious assumptions and biases, or other factors?

Cognitive biases

In the class exercise, Group A was focused only on Question A, whereas Group B focused on Question B. When forensic experts review a report, our approach should be bidirectional, and incorporate both perspectives.

Constructive skepticism benefits from an understanding of cognitive biases and how they work. In the instant case, the most obvious of these was confirmatory bias. This is the tendency to actively seek out and assign more weight to information that confirms one's prior beliefs, discounting or ignoring disconfirmatory data. Clinicians who fall under the spell of psychopathy theory, for example, tend to see psychopaths lurking behind every bush. A clue to the author's preconceptions in John's case was found in a footnote citing Stanton Samenow’s The Criminal Mind, an influential but decidedly polemic treatise that vigorously disavows social factors in crime and -- as its title implies -- caricatures criminals as a breed apart from normal human beings.

Once you detect such selective perception in play, you may see related cognitive biases which the discerning expert should always be on the lookout for in forensic (and other) reports. These include, but are not limited to:

Salience bias, in which inordinate attention is paid to exotic or highly distinctive information, at the expense of ordinary features of a case that may be important. In John's case, the evaluator overweighted the mother's fanciful tales about John's early childhood ("He never cried liked a normal baby!"), while ignoring more proximate evidence of John's confusion over his sexuality. In criminal cases, salience bias often contributes to racial stereotyping.

Hindsight bias, or the tendency to see events as more predictable than they were before they took place. Using hindsight, forensic experts are prone to overvalue known facts that tend to explain an event; a countermeasure is to deliberately consider information that supports alternate conclusions.

Availability bias, in which the probability of an event is judged by how easy it is to think of examples. Especially when combined with ignorance of base rates, this can lead to a tendency to overpredict dramatic events, even when -- as in the case of black swans -- their likelihood is actually low.

Illusory correlation, in which a relationship is imagined between variables that are in fact unrelated. In John's case, the mother's dramatic tales -- even if true -- may have had little or nothing to do with John's teenage misconduct. However, when read by subsequent decision-makers in a cultural climate that privileges psychopathy as an explanation for criminal conduct, they had an enormously prejudicial impact.

(Wikipedia maintains an exhaustive list of these decision-making biases, along with links to their definitions.)

To avoid perpetuating biases, forensic evaluators should train themselves to think like "Agent J" in Men in Black. Rather than jumping to superficially plausible conclusions, try to consciously develop alternate hypotheses and test their fit with the evidence. This scientific mindset kept Agent J (Will Smith) from assuming that little Tiffany, a blonde girl carrying quantum physics textbooks through the ghetto at night, was the innocent party just because she did not superficially resemble the monsters who were also out and about. Here is the scene from Men in Black that I show in my class, in which Agent J explains his logic in shooting Tiffany -- rather than the monsters -- during a simulation training:

May 29, 2012

SVP risk tools show 'disappointing' reliability in real-world use

Rater agreement on three instruments commonly used to assess sex offenders' risk of recidivism is much lower in practice than reported in the tools' manuals, according to a new study out of Florida.

Faring most poorly was the Psychopathy Checklist (PCL-R). Correlations of scores between two evaluators hired by the same agency were in the low range. On average, psychologists differed by five points on the instrument, which has a score range of of zero to 40. In one case, two evaluators were apart by a whopping 24 points!

Agreement among evaluators was only moderate on the Static-99 and the MnSOST-R, two actuarial risk assessment instruments for which scoring is relatively more straightforward.

The study, published in the respected journal Psychological Assessment, was a collaboration between scholars from the Department of Mental Health Law and Policy at the University of South Florida and researchers with the Florida Department of Children and Families. It utilized archived records culled from the almost 35,000 individuals screened for possible Sexually Violent Predators (SVP) civil commitment in Florida between 1999 and 2009. The researchers located 315 cases in which the same individual was evaluated by separate clinicians who each administered both the PCL-R and at least one of the two actuarial measures within a short enough time frame to enable direct scoring comparisons.

It would be a mistake to lean too heavily on the results of a single isolated study. But the present study adds to a burgeoning body from several groups of independent researchers, all pointing to troubling problems with the accuracy of instruments designed to forecast risk of recidivism among sex offenders.

Related study: Psychopathy and sexual deviance not predictive

Collectively, the research has been especially critical of the ability of the highly prejudicial construct of psychopathy to add meaningfully to risk prediction in this high-stakes arena. Indeed, just this week another study has come out indicating that neither psychopathy scores nor sexual deviance measures improve on the accuracy provided by an actuarial instrument alone.

An especially interesting finding of that Canadian study is that reoffense rates were still below 12 percent over a 6-year followup period for even the most high-risk offenders -- those with high risk ratings on the Static-99R plus high levels of psychopathy and sexual deviance (as measured by phallometric testing). This makes it inappropriate to inflate risk estimates over and above those derived from Static-99R scores alone, the authors caution.

Item-level analysis finds varying rates of accuracy

A unique contribution of the Florida study is its analysis of the relative accuracy of every single item in each of the three instruments studied. Handy tables allow a forensic practitioner to see which items have the poorest reliability, meaning they should be viewed skeptically by forensic decision-makers.

For example, take the MnSOST-R, a now-defunct instrument with a score range of –14 to 31 points. The total gap between evaluators was as wide as 19 points; the items with the greatest variability in scoring were those pertaining to offenders' functioning during incarceration, such as participation in treatment.

Meanwhile, the weak performance of the Psychopathy Checklist owes much to the items on its so-called “Factor 1,” which attempt to measure the personality style of the psychopath. As I've discussed before, rating someone as “glib,” “callous” or “shallow” is a highly subjective enterprise that opens the door to a veritable avalanche of personal bias.

Piggy-backing off a recommendation by John Edens and colleagues, the Florida team suggests that the prejudicial deployment of the Psychopathy Checklist may be superfluous, in that scores on Factor 2 alone (the items reflecting a chronic criminal lifestyle) are more predictive of future violence or sexual recidivism.

Next up, we need to identify the causes of the poor interrater reliability for forensic risk prediction instruments in real-world settings. Is it due to inadequate training, differing clinical skills, variable access to collateral data, intentional or unintentional bias on the part of examiners, adversarial allegiance effects (not a factor in the present study, since both evaluators were appointed by the same agency), or some combination?

In the meantime, the fact that two evaluators working on the same side cannot reliably arrive at the same risk rating for any particular individual should certainly raise our skepticism about the validity of risk prediction based on these instruments.

The studies are:

Reliability of Risk Assessment Measures Used in Sexually Violent Predator Proceedings. Cailey Miller, Eva Kimonis, Randy Otto, Suzonne Kline and Adam Wasserman. Psychological Assessment. Published online 7 May 2012. Advance online publication. Click HERE to contact the authors.

Does Consideration of Psychopathy and Sexual Deviance Add to the Predictive Validity of the Static-99R? Jan Looman, Nicola A. C. Morphett and Jeff Abracen. International Journal of Offender Therapy and Comparative Criminology. Published online 28 May 2012. Click HERE to contact the authors.

Inter-rater reliability of the PCL-R total and factor scores among psychopathic sex offenders: Are personality features more prone to disagreement than behavioral features? John Edens, Marcus Boccaccini and D. W. Johnson. 2010. Behavioral Sciences and the Law, 28 (1), 106–119.

February 15, 2020

Flawed science? Two efforts launched to improve scientific validity of psychological test evidence in court

New report slams "junk science” psychological assessments

"Life or death" call to halt prejudicial use of psychopathy test

Beware the all-powerful law of unintended consequences

September 3, 2015

Adversarial allegiance: Frontier of forensic psychology research

Related blog posts:

September 4, 2014

More studies finding bias in PCL-R measurement of psychopathy

Field Reliability of the Psychopathy Checklist-Revised Among Life Sentenced Prisoners in Sweden

Evaluator Differences in Psychopathy Checklist-Revised Factor and Facet Scores

January 5, 2014

New evidence of psychopathy test's poor accuracy in court

Study: Risk tools don't work with psychopaths (Oct. 8, 2013)

Remarkable experiment proves pull of adversarial allegiance (March 5, 2013) ***

Study: Psychopathy score fails to predict sexual recidivism (Oct. 30, 2011)

Psychopathy: A Rorschach test for psychologists? (March 25, 2011) ***

Metaanalysis debunks psychopathy-violence link (Sept. 3, 2010)

Psychopathy controversy goes prime time (June 10, 2010)

PCL-R inventor wringing his hands over forensic misuse (May 26, 2011)

Psychopathy brouhaha: It's a wrap (June 17, 2010)

October 8, 2013

Study: Risk tools don't work with psychopaths

August 25, 2013

Forensnips aplenty, forensnips galore

March 5, 2013

Remarkable experiment proves pull of adversarial allegiance

Psychologists' scoring of forensic tools depends on which side they believe has hired them

But ... WHICH psychologists?!

February 11, 2013

Exercise: Priming students to detect covert biases

Cutting through the hype

Cognitive biases

May 29, 2012

SVP risk tools show 'disappointing' reliability in real-world use