IN THE NEWS

March 5, 2013

Remarkable experiment proves pull of adversarial allegiance

Psychologists' scoring of forensic tools depends on which side they believe has hired them

A brilliant experiment has proven that adversarial pressures skew forensic psychologists' scoring of supposedly objective risk assessment tests, and that this "adversarial allegiance" is not due to selection bias, or preexisting differences among evaluators.

The researchers duped about 100 experienced forensic psychologists into believing they were part of a large-scale forensic case consultation at the behest of either a public defender service or a specialized prosecution unit. After two days of formal training by recognized experts on two widely used forensic instruments -- the Psychopathy Checklist-R (PCL-R) and the Static-99R -- the psychologists were paid $400 to spend a third day reviewing cases and scoring subjects. The National Science Foundation picked up the $40,000 tab.

Unbeknownst to them, the psychologists were all looking at the same set of four cases. But they were "primed" to consider the case from either a defense or prosecution point of view by a research confederate, an actual attorney who pretended to work on a Sexually Violent Predator (SVP) unit. In his defense attorney guise, the confederate made mildly partisan but realistic statements such as "We try to help the court understand that ... not every sex offender really poses a high risk of reoffending." In his prosecutor role, he said, "We try to help the court understand that the offenders we bring to trial are a select group [who] are more likely than other sex offenders to reoffend." In both conditions, he hinted at future work opportunities if the consultation went well.

The deception was so cunning that only four astute participants smelled a rat; their data were discarded.

As expected, the adversarial allegiance effect was stronger for the PCL-R, which is more subjectively scored. (Evaluators must decide, for example, whether a subject is "glib" or "superficially charming.") Scoring differences on the Static-99R only reached statistical significance in one out of the four cases.

The groundbreaking research, to be published in the journal Psychological Science, echoes previous findings by the same group regarding partisan bias in actual court cases. But by conducting a true experiment in which participants were randomly assigned to either a defense or prosecution condition, the researchers could rule out selection bias as a cause. In other words, the adversarial allegiance bias cannot be solely due to attorneys shopping around for simpatico experts, as the experimental participants were randomly assigned and had no group differences in their attitudes about civil commitment laws for sex offenders.

Sexually Violent Predator cases are an excellent arena for studying adversarial allegiance, because the typical case boils down to a "battle of the experts." Often, the only witnesses are psychologists, all of whom have reviewed essentially the same material but have differing interpretations about mental disorder and risk. In actual cases, the researchers note, the adversarial pressures are far higher than in this experiment:

"This evidence of allegiance was particularly striking because our experimental manipulation was less powerful than experts are likely to encounter in most real cases. For example, our participating experts spent only 15 minutes with the retaining attorney, whereas experts in the field may have extensive contact with retaining attorneys over weeks or months. Our experts formed opinions based on files only, which were identical across opposing experts. But experts in the field may elicit different information by seeking different collateral sources or interviewing offenders in different ways. Therefore, the pull toward allegiance in this study was relatively weak compared to the pull typical of most cases in the field. So the large group differences provide compelling evidence for adversarial allegiance."

This is just the latest in a series of stunning findings by this team of psychologists led by Daniel Murrie of the University of Virginia and Marcus Boccaccini of Sam Houston University on an allegiance bias among psychologists. The tendency of experts to skew data to fit the side who retains them should come as no big surprise. After all, it is consistent with 2009 findings by the National Academies of Science calling into question the reliability of all types of forensic science evidence, including supposedly more objective techniques such as DNA typing and fingerprint analysis.

Although the group's findings have heretofore been published only in academic journals and have found a limited audience outside of the profession, this might change. A Huffington Post blogger, Wray Herbert, has published a piece on the current findings, which he called "disturbing." And I predict more public interest if and when mainstream journalists and science writers learn of this extraordinary line of research.

In the latest study, Murrie and Boccaccini conducted follow-up analyses to determine how often matched pairs of experts differed in the expected direction. On the three cases in which clear allegiance effects showed up in PCL-R scoring, more than one-fourth of score pairings had differences of more than six points in the expected direction. Six points equates to about two standard errors of measurement (SEM's), which should happen by chance in only 2 percent of cases. A similar, albeit milder, effect was found with the Static-99R.

Adversarial allegiance effects might be even stronger in less structured assessment contexts, the researchers warn. For example, clinical diagnoses and assessments of emotional injuries involve even more subjective judgment than scoring of the Static-99 or PCL-R.

But ... WHICH psychologists?!

For me, this study raised a tantalizing question: Since only some of the psychologists succumbed to the allegiance effect, what distinguished those who were swayed by the partisan pressures from those who were not?

The short answer is, "Who knows?"

The researchers told me that they ran all kinds of post-hoc analyses in an effort to answer this question, and could not find a smoking gun. As in a previous research project that I blogged about, they did find evidence for individual differences in scoring of the PCL-R, with some evaluators assigning higher scores than others across all cases. However, they found nothing about individual evaluators that would explain susceptibility to adversarial allegiance. Likewise, the allegiance effect could not be attributed to a handful of grossly biased experts in the mix.

In fact, although score differences tended to go in the expected direction -- with prosecution experts giving higher scores than defense experts on both instruments -- there was a lot of variation even among the experts on the same side, and plenty of overlap between experts on opposing sides.

So, on average prosecution experts scored the PCL-R about three points higher than did the defense experts. But the scores given by experts on any given case ranged widely even within the same group. For example, in one case, prosecution experts gave PCL-R scores ranging from about 12 to 35 (out of a total of 40 possible points), with a similarly wide range among defense experts, from about 17 to 34 points. There was quite a bit of variability on scoring of the Static-99R, too; on one of the four cases, scores ranged all the way from a low of two to a high of ten (the maximum score being 12).

When the researchers debriefed the participants themselves, they didn't have a clue as to what caused the effect. That's likely because bias is mostly unconscious, and people tend to recognize it in others but not in themselves. So, when asked about factors that make psychologists vulnerable to allegiance effects, the participants endorsed things that applied to others and not to them: Those who worked at state facilities thought private practitioners were more vulnerable; experienced evaluators thought that inexperience was the culprit. (It wasn't.)

I tend to think that greater training in how to avoid falling prey to cognitive biases (see my previous post exploring this) could make a difference. But this may be wrong; the experiment to test my hypothesis has not been run.

The study is: "Are forensic experts biased by the side that retained them?" by Daniel C. Murrie, Marcus T. Boccaccini, Lucy A. Guarnera and Katrina Rufino, forthcoming from Psychological Science. Contact the first author (HERE) if you would like to be put on the list to receive a copy of the article as soon as it becomes available.

Click on these links for lists of my numerous prior blog posts on the PCL-R, adversarial allegiance, and other creative research by Murrie, Boccaccini and their prolific team. Among my all-time favorite experiments from this research team is: "Psychopathy: A Rorschach test for pychologists?"

March 3, 2013

God's Jury: Exploring Inquisitions, then and now

The word "Inquisition" harkens back to medieval Europe - Italy, Spain, Portugal and the Catholic Church. But in Cullen Murphy's frightening account, that repressive past was only prologue: The self-propagating bureaucracies of the modern world contain the seeds of inquisitions potentially far vaster and more destructive than anything wrought by the Catholic Church.

Murphy seamlessly traces the 700-year history of successive Catholic Inquisitions to expose their underlying mechanisms, and highlight the fundamental similarities between then and now. The "enhanced interrogation" practiced at Guantanamo is not so different from the Roman rigoros esamine (rigorous examination), he explains. Indeed, modern interrogation methods as outlined in a U.S. Army manual eerily parallel to the sophisticated techniques first outlined in an inquisition manual from the 1300s.

Inquisition waterboarding

Murphy, himself a Catholic, encourages us to broaden our historical lens to see that inquisitions need not necessarily be religious. They can occur any time members of a dominant group - whether religious, political, corporate or national - appoint themselves "God’s jury," believing that they alone are privy to the true and right path. The "inquisitorial impulse" springs directly from moral certainty. Think about the inquisitions over the last century alone, just in the United States: The Palmer Raids (an early Red Scare led by the young J. Edgar Hoover), The Japanese internment, Cointelpro, the Patriot Act. The McCarthy Era alone was more far-reaching than any church inquisition, he argues.

But inquisitions require certain tangible assets, and it is these that the modern world possesses in abundance:

A bureaucratic machinery: Bureaucracies are self-perpetuating and expansionistic. They require no evil conspiracy at the helm. Take the Transportation Security Administration, whose methods since 9/11 have grown ever more "invasive, mindless, and routine": A single "credible tip" can get one's name added to the 440,000 on a secret terrorism watch list; but people are not allowed to find out if their names are on that official list. Shades of the inquisition? Repressive regimes are, at base, record-keeping regimes.

Surveillance: As far back as 1796, philosopher Johann Gottlieb Fichte noted that "the chief principle of a well-regulated police state" was the ability to identify its citizens and keep track of their activities and whereabouts. Murphy shows how the modern surveillance state has expanded to new heights in the wake of 9/11, especially in the United States and in England. As a British surveillance leader justifies it, "If you've got nothing to hide, you've got nothing to fear." The game of surveillance, says Murphy, ratchets forever upward, so that what was heretofore unimaginable is constantly becoming the new normal.

Censorship: Just as the Vatican has its catalogues of banned books (which Murphy spent time examining), the Internet has its "choke points" that can be manipulated to deny the public access to information. Less obvious but no less sinister are today's "mobious strips of the like-minded," creating an "epistemic closure" in which people are able to avoid exposure to information that might challenge their assumed realities.

Whereas both the targets of an inquisition and the motives of the inquisitors can shift with time and place, these tangible underpinnings - proof of identity, efficient record-keeping, a network of informers, surveillance, denunciations, interrogations - remain constant. And they are all ubiquitous in the modern world.

The history lessons Murphy is able to impart in God's Jury owe in part to the Vatican's decision to open its archives (although only up to 1939) to outside scrutiny, an unprecedented boon to scholars. Murphy is a fluid writer, and his descriptions of the archives and their contents contain so many riveting nuggets that the book's pages fairly turn themselves.

Forensic psychologists may be especially interested in his description of interrogations and false confessions, so parallel in many ways to what we witness today in style, if not in content. Armed with a manual on witchcraft, Mallens Maleficarum (which Murphy describes as a cross between Monty Python and Mein Kampf), inquisitors sallied far and wide in search of purported witches, whom they coerced through now-familiar techniques of shaping to admit to such things as having sex with the devil.

God's Jury is unsettling. But Murphy does offer a ray of hope. Just as the inquisitions of yester-year were extinguished by the Enlightenment ("the intellectual equivalent of habitat destruction"), Murphy maintains that there is a remedy for contemporary inquisitions. He does not believe they can be legislated away, although more power to those who are valiantly trying to place legal limits on repression. Rather, he believes that "the most effective ally" against inquisitionism is the "seventh virtue" of humility. Inquisitions can only occur, he argues, when those in power insist with absolute certainty that they hold the one and only absolute truth, and that everyone else is wrong.

If you found this review worthwhile, I would greatly appreciate your taking just a moment to go to my Amazon review (click HERE), and click on the "YES" button at the bottom (this review was helpful). This will help boost the review's ranking. Thanks in advance.

Of related interest: NPR's "The inquisition: A model for modern interrogators," which includes a downloadable podcast and an excerpt from God's Jury

February 28, 2013

A tale of two prison systems: Whither the future?

Group therapy, San Quentin Prison, California

California's beleaguered prison system got more bad headlines today for suppressing a report warning that prison suicide-watch practices were actually fostering suicide. The suppressed report, by a national expert on prison suicide, described suicidal prisoners being stripped of their clothes, placed in “safety smocks,” and then held for days "in dim, dirty, airless cells with unsanitized mattresses on the floor," according to today's Los Angeles Times. The horrific conditions encouraged prisoners "to declare they were no longer suicidal just to escape the holding cells. Many of them took their own lives soon after."

The state directed its consultant, Lindsay Hayes, to write a sanitized version of his report to give to a court monitor and lawyers for prisoners, according to court records reviewed by Times reporter Paige St. John. And when prisoner lawyers were nonetheless able to get a copy of the full report, which called the treatment of suicidal prisoners "punitive" and "anti-therapeutic," the state made an unsuccessful effort to have a judge order the report destroyed.

There were 32 prison suicides in California in 2012, above the national average in the United States.

Convict sunbathing on porch of his bungalow,
Bastoy Prison (photo credit: Marco Di Lauro)

Meanwhile, more than 5,000 miles and an ocean away, sits a peaceful island prison which has not seen a single suicide in its two decades of operation. Bastoy, an island prison in Norway, with no bars or concertina fences, bills itself as "the first ecological prison in the world."

It might not seem fair to compare California prisons with those in Norway, a small and homogeneous nation with only 4,000 prisoners all told. But Norway's forward-looking penal philosophy is worth a gander. The idea is to build people up into productive citizens, rather than to tear them down. To "generate hope instead of despair" in the words of Erwin James, himself a former prison lifer in the UK who recently toured Bastoy and wrote about it for the Guardian.

Debarking from the ferry, James found an atmosphere more akin to a religious commune than the British prisons he was accustomed to. "There is a sense of peace about the place," he wrote, describing the brightly painted wooden bungalows where the island's 115 prisoners live in groups of up to six, cooking their own meals with money earned from prison jobs and food purchased at the "well-stocked mini-supermarket."

A quick dip after work, Bastoy Prison

Norway has no death penalty or life sentences; the maximum sentence is 21 years. Prisoners can apply to Bastoy when they are down to the last five years of their sentences. They must commit to non-violence and a drug- and alcohol-free lifestyle.

Who wouldn't take a deal like that, to live in an idyllic beach resort while learning the life skills necessary to reintegrate into society? Even when the sea ice was frozen solid last winter, not a single convict walked away.

"In closed prisons we keep them locked up for some years and then let them back out, not having had any real responsibility for working or cooking," explains director Arne Nilsen, a clinical psychologist. "In the law, being sent to prison is nothing to do with putting you in a terrible prison to make you suffer. The punishment is that you lose your freedom. If we treat people like animals when they are in prison they are likely to behave like animals. Here we pay attention to you as human beings."

The proof of Norway's philosophy is in the pudding: Balstoy's re-offense rate of just 10 percent is by far the lowest in Europe. Compare that to California, where seven out of ten released prisoners bounce back into custody within three years, the highest rate in the United States.

Click on image to see 5-minute YouTube feature on Bastoy

One of the guards showing James around the island looks at him with disbelief when he tells her that prison officer training in the UK lasts only six weeks. In Norway, the training takes three years. Here in California, meanwhile, basic training lasts 16 weeks, with a focus on "effective use of force," "restraint devices" and "cell searches."

Ad for prison suicide smock

And what, pray tell, are the guards in Norway spending all of that time studying?

"There is so much to learn about the people who come to prison," the guard explains to James. "We need to try to understand how they became criminals, and then help them to change."

With a rehabilitative philosophy like that, let's just hope that Bastoy -- and not California or the UK -- represents the way of the future. After all, by treating prisoners with respect and humanity, Norway is also creating a safer world.

Hat tip: Jane

February 26, 2013

Tipping points: Of life, death and psychological data

Forensic psychologists and the machinery of execution

Andre Thomas, Texas

When Andre Thomas killed his wife and children, he was careful to use three different knives so that "the blood from each body would not cross-contaminate, thereby ensuring that the demons inside each of them would die," as Marc Bookman explained it in an eloquent Mother Jones report. Then, he cut out their hearts and went to the police station to confess. While awaiting trial, he cut out one of his eyes. Later, he cut out the other, eating it in order to keep the government from using it to spy on his mind.

In response to changing social mores and international condemnation (only a handful of countries remains in the business of killing their wayward citizens), the U.S. Supreme Court in 2002 exempted the mentally retarded from execution, following up three years later by exempting juveniles. With this narrowing of the contours of capital punishment, the question of how mentally impaired one must be to avoid execution is increasingly in the forefront. That makes severe mental illness "the next frontier" of capital jurisprudence, in the words of psychology-law scholar Bruce Winick.

How insane?

Executing the floridly insane constitutes cruel and unusual punishment, barred under the Eighth Amendment of the U.S. Constitution. However, the "Ford standard" for competency to be executed is very low; a condemned person need merely understand the link between his crime and his punishment. In Thomas's case, the government insists that he is not insane enough to be spared, despite chronic auditory hallucinations, delusions, and treatment for paranoid schizophrenia.

Making this case especially ironic is that Thomas has become a poster child for the need for new laws allowing preemptive detention of people whose mental illness makes them dangerous. "At least twice in the three weeks before the crime, Thomas had sought mental health treatment," reports the Texas Tribune in a series on mental health and the criminal justice system. "On two occasions, staff members at the medical facilities were so worried that his psychosis made him a threat to himself or others that they sought emergency detention warrants for him. Despite talk of suicide and bizarre biblical delusions, he was not detained for treatment."

John Errol Ferguson, Florida

With the U.S. Supreme Court declining to draw a bright line, the question of exactly how rational a condemned prisoner's understanding must be in order for an execution to proceed has become central to legal appeals by psychotic prisoners like Thomas. Another current example is the case of John Errol Ferguson, a mass killer in Florida whose October execution was stayed due to concerns about his mental state. Ferguson's long history of paranoid schizophrenia is undisputed; the question is whether his grandiose and religious delusions interfere with his understanding that the state is going to kill him for his crimes, and that when he dies he will be, well, dead.

Ferguson's lawyers have argued that the killer lacks rational understanding, because he believes he is "the Prince of God" and will be returned to Earth post-execution to save the world from a communist plot. The state of Florida counters that all that is required to be competent for execution is that a prisoner have an "awareness" that he is set to be executed for crimes he committed. To resolve the dispute, Florida's governor appointed a panel of experts to collectively evaluate Ferguson; a lower court also heard extensive testimony from prison personnel and other mental health experts, including malingering expert Richard Rogers, who administered a large battery of malingering tests and opined that Ferguson was not faking mental illness. Ultimately, the circuit court found little to distinguish Ferguson's belief system from typical religious ideation:

"There is no evidence in the record that Ferguson’s belief as to his role in the world and what may happen to him in the afterlife is so significantly different from beliefs other Christians may hold so as to consider it a sign of insanity."

How intellectually impaired?

Meanwhile, with the categorical exemption of prisoners with mental retardation from the death row rosters, courts around the nation are seeing pitched battles over intelligence scores that can make the difference between life and death. On each side of the IQ Wars in so-called Atkins hearings (named for the 2002 U.S. Supreme Court decision barring execution of the developmentally disabled) are neuropsychologists whose testimony delves into the technicalities of margins of error, practice effects, and the now-familiar Flynn Effect. This latter phenomenon of IQ inflation, in which scores on any given IQ test rise by about three points per decade, creates a situation in which a person on the cusp of mental retardation might score over 70 -- making him eligible for execution -- on an older IQ test but not on a newer one.

Ronell Wilson, New York

Take the case of Ronell Wilson in New York, who murdered two undercover police officers. His nine-day Atkins hearing earlier this winter featured seven experts dissecting nine IQ scores obtained over a 13-year period. In its 55-page opinion, the U.S. District Court spent many pages explaining why a 95 percent confidence interval (a range of two Standard Errors of Measure on either side of a score, something commonly reported in clinical practice) was inappropriate in Atkins claims, because it could place people into the range of mental retardation even if they score well above 70 on IQ tests. The court instead opted for a 66 percent confidence level. Either way, it was all much ado about nothing: "Even after taking into account the possibility of measurement error, the Flynn Effect, and (to a limited extent) the practice effect," Wilson's IQ scores ranging from 70 to 84 were "simply too high to qualify him under the definition of significantly subaverage intellectual functioning."

As Peter Aldhous reports in the New Scientist, the outcomes of these IQ battles vary widely by jurisdiction (and quality of lawyering, I would imagine). Overall, 38 percent of Atkins claims are successful, according to a study at Cornell Law School, but the success rate is 81 percent in North Carolina compared with only 12 percent in Alabama. A convicted killer named Earl Davis with IQ scores of 75, 76, 65 and 70 was spared execution on the basis of the Flynn effect. But that same effect was not persuasive in the case of Kevin Green of Virginia, whose mean IQ score was actually three points lower than Davis's (71, 55, 74 and 74); Green was executed in 2008.

Texas, meanwhile, which has carried out more than one-third of all executions in the United States since capital punishment was reinstated, has come up with its own unique standard of mental retardation, based on the character Lennie from John Steinbeck's Of Mice and Men. Wrote the Texas Court of Criminal Appeals in a 2004 explication of the level of mental retardation necessary to avoid the death penalty:

"Texas citizens might agree that Steinbeck's Lennie should, by virtue of his lack of reasoning ability and adaptive skills, be exempt. But, does a consensus of Texas citizens agree that all persons who might legitimately qualify for assistance under the social services definition of mental retardation be exempt from an otherwise constitutional penalty?"

A technical spectacle

Whereas in the real world intelligence and insanity are continuous variables, the law chooses to treat them as dichotomous. Psychologists assist in promoting this legal fiction, helping to sort the condemned into discreet categories of sane or insane, mentally retarded or able-minded. Although the tests we used are supposedly objective, data in this highly polarized area can be skewed to favor one outcome or the other. Neuropsychology experts hired by the defense may focus on the Flynn Effect and argue for large confidence bands around IQ scores. Meanwhile, at least one "go-to" psychologist for prosecutors in Texas took a decidedly different approach, systematically skewing data so that more marginally functioning men were made eligible for execution.

Denkowski's Atkins cases, Texas Observer

George Denkowski developed his own method of evaluating Atkins claims, based on his idea that individuals on Death Row may do poorly on traditional tests because of cultural and social factors rather than lack of intellectual ability. So he discounted evidence that defendants, for example, could not count money or take care of their basic hygiene, reasoning that maybe they just were not taught those skills. With an inmate named Daniel Plata, for example, Denkowski bumped up his IQ score from 70 to 77 and his score on a test of adaptive functioning from 61 to 71. He even published an article in the American Journal of Forensic Psychology in 2008 in which he explained this system of clinical overrides. Complaints by fellow psychologists that his technique had no scientific basis eventually led the Texas State Board of Examiners of Psychologists to issue a reprimand and to bar him from conducting future intellectual disability evaluations in criminal cases. He admitted no legal wrongdoing but agreed to a $5,500 fine -- a pretty lightweight penalty considering that two of the 29 condemned men he evaluated were executed.

Unethical as his method was, it did give attention to the issues of race and class, which may hide in plain sight when appeals revolve around the technical interpretations of psychological test data. It is Constitutionally impermissible for race to be considered in capital cases. But it stretches credulity to believe race played no role, for example, in the case of eye-plucking Andre Thomas: Thomas is African American, his late wife was white, all of the jurors were white, and four jurors had acknowledged opposition to interracial marriages. In the very last sentence of his closing argument for the death penalty, reported Bookman in the Mother Jones piece, the prosecutor asked jurors whether they would be willing to risk Thomas "asking your daughter out, or your granddaughter out?" This in the town of Sherman, which burned its entire Black district to the ground in 1930 during a race riot triggered by -- what else -- rumors that a Black man had raped a white woman.

Trauma as common denominator

Setting aside the technical criteria for insanity and mental retardation, if one could boil capital cases down to one common denominator, it would be trauma. In my experiences working in the capital trenches, I have found that most Death Row denizens survived horrific childhoods dominated by physical, sexual and emotional torture and neglect, combined with multi-generational patterns of mental illness and violence, all overlaid with hard-core substance abuse.

As forensic psychiatrist Pablo Stuart described this phenomenon in an interview with reporter Scott Johnson at Oakland Effect, a journalism project focusing on violence in Oakland, California, “the fact that there is such consistency on these cases is significant. Some of these people, they just never had a chance.”

* * * * *

Related resources:

The Mother Jones report on Andre Thomas is HERE; the audio podcast, read by M*A*S*H star Mike Farrell, can be downloaded or listened to HERE. My 2009 posts on the Andre Thomas case are HERE and HERE.

My prior posts on the Ford standard of competency and the U.S. Supreme Court's decision in the case of Leon Panetti (with links to court rulings and lots of related resources) are HERE, HERE and HERE. The U.S. Supreme Court's 2007 opinion in Panetti v. Quarterman is HERE. A 28-minute educational video, "Executing the Insane: The Case of Scott Panetti," is available HERE.

My 2010 post on the Denkowski case is HERE.

Psychologist Kevin McGrew's master archive on the Flynn Effect is HERE.

Related books include Michael Perlin's Mental Disability and the Death Penalty: The Shame of the States (the first chapter of which can be previewed HERE) and Daniel Murrie and David DeMatteo's Forensic Mental Health Assessments in Death Penalty Cases.

February 11, 2013

Exercise: Priming students to detect covert biases

In an eye-opening exercise in my graduate forensic psychology course, I had two groups separately analyze a sanitized forensic report. The subject of the report was a 16-year-old boy named "John" who had committed a relatively minor sex offense; the evaluation issue was treatment amenability. After independent group discussions, the two groups shared their impressions as follows:

Group A: "John has a conduct disorder and is narcissistic. His misconduct appears to be escalating. There are ominous warning signs of budding psychopathy. He is at a crossroads in his life; he could go bad fast."

Group B: "This report is biased. The evaluator has joined with John's mother, and is channeling the mother's antagonism toward John. There is evidence of racism, homophobia, and political conservatism. The evaluator’s antipathy toward John feels personal – perhaps he has a wayward teenage son?"

The two groups looked across the table at each other, flabbergasted. Some suspected a trick. "Did you really give us the same report to read?" one student queried.

Yes, everyone had read the identical report. And, in case you wondered, group selection was random; there were no baseline differences that would explain the groups' divergent opinions.

Rather, the difference was in how the two groups were primed to read the report. Their instructions:

Group A: "Read the report with the goal of trying to understand John. What makes him tick? Does he have any potential clinical diagnoses? What is your prognosis for his future?"

Group B: "Read the report with the goal of trying to understand the perspective of the report writer. Do you see any problems with his method or his analysis? If so, do they suggest any potential biases?"

This was no abstract academic exercise. Channeling John’s hateful mother, this seminal report reads like something torn from the pages of an Anne Rule novel, replete with enough (uncorroborated) animal torture and arson to excite any true believer in the infamous McDonald Triad. Going unchallenged at the time, the report had a hugely prejudicial impact on decision-makers. For years to come, institutional bureaucrats and forensic experts quoted liberally from it to bolster their opinions that John was dangerous.

This is not an isolated or unusual case. Alarmist reports like this have remarkable staying power, their uncorroborated claims taking on a life of their own as they ripple through their subjects' lives, eschewing rational analysis or contestation. The power of a single forensic evaluator is truly frightening at times.

Cutting through the hype

So how did a group of graduate students manage to see through the hype that had buffaloed seasoned professionals, to take the measure of the evaluator and expose his subterranean biases? Remarkably, all it took was a simple admonition to think critically, and to be alert to potential biases.

Ideally, we should always be exercising these analytical faculties. We should train ourselves to simultaneously process at least two units of analyses, asking ourselves both:

A. What does this report tell us about its subject?

B. What are the limitations of this report? How might its findings be unreliable, and perhaps flawed by unreliable or insufficient information, unconscious assumptions and biases, or other factors?

Cognitive biases

In the class exercise, Group A was focused only on Question A, whereas Group B focused on Question B. When forensic experts review a report, our approach should be bidirectional, and incorporate both perspectives.

Constructive skepticism benefits from an understanding of cognitive biases and how they work. In the instant case, the most obvious of these was confirmatory bias. This is the tendency to actively seek out and assign more weight to information that confirms one's prior beliefs, discounting or ignoring disconfirmatory data. Clinicians who fall under the spell of psychopathy theory, for example, tend to see psychopaths lurking behind every bush. A clue to the author's preconceptions in John's case was found in a footnote citing Stanton Samenow’s The Criminal Mind, an influential but decidedly polemic treatise that vigorously disavows social factors in crime and -- as its title implies -- caricatures criminals as a breed apart from normal human beings.

Once you detect such selective perception in play, you may see related cognitive biases which the discerning expert should always be on the lookout for in forensic (and other) reports. These include, but are not limited to:

Salience bias, in which inordinate attention is paid to exotic or highly distinctive information, at the expense of ordinary features of a case that may be important. In John's case, the evaluator overweighted the mother's fanciful tales about John's early childhood ("He never cried liked a normal baby!"), while ignoring more proximate evidence of John's confusion over his sexuality. In criminal cases, salience bias often contributes to racial stereotyping.

Hindsight bias, or the tendency to see events as more predictable than they were before they took place. Using hindsight, forensic experts are prone to overvalue known facts that tend to explain an event; a countermeasure is to deliberately consider information that supports alternate conclusions.

Availability bias, in which the probability of an event is judged by how easy it is to think of examples. Especially when combined with ignorance of base rates, this can lead to a tendency to overpredict dramatic events, even when -- as in the case of black swans -- their likelihood is actually low.

Illusory correlation, in which a relationship is imagined between variables that are in fact unrelated. In John's case, the mother's dramatic tales -- even if true -- may have had little or nothing to do with John's teenage misconduct. However, when read by subsequent decision-makers in a cultural climate that privileges psychopathy as an explanation for criminal conduct, they had an enormously prejudicial impact.

(Wikipedia maintains an exhaustive list of these decision-making biases, along with links to their definitions.)

To avoid perpetuating biases, forensic evaluators should train themselves to think like "Agent J" in Men in Black. Rather than jumping to superficially plausible conclusions, try to consciously develop alternate hypotheses and test their fit with the evidence. This scientific mindset kept Agent J (Will Smith) from assuming that little Tiffany, a blonde girl carrying quantum physics textbooks through the ghetto at night, was the innocent party just because she did not superficially resemble the monsters who were also out and about. Here is the scene from Men in Black that I show in my class, in which Agent J explains his logic in shooting Tiffany -- rather than the monsters -- during a simulation training:

February 7, 2013

Fremantle to host Australian forensic conference

I hope all of you Aussies out there are aware of the exciting forensic psychology conference coming up in April. The theme is timely: "The Times are a Changin': Controversies, Competencies, and DSM-5." Robert Krueger, a personality researcher at the University of Minnesota and a member of the Personality Disorders Workgroup for the DSM-5, will give a keynote focusing on issues specific to using the DSM-5 personality disorders in court. The other keynote speaker is Jane Goodman-Delahunty of Charles Sturt University, a prominent psychologist and attorney who will speak about psychological injuries from workplace harassment. The setting, for those of you who might want to travel to Australia to attend, is the western city of Fremantle, which bills itself as the best preserved 19th Century seaport in the world. (The conference alternates between eastern and western Australia; when I gave a keynote there two years ago, it was held in the idyllic resort setting of Noosa, in southern Queensland.) The website for the April 18-20 event is HERE; the full program can be downloaded HERE. Don't procrastinate too long, as early-bird registration ends March 18.

Panorama of the Swan River Settlement (Fremantle), 1831 (Source: Wikipedia Commons)

February 5, 2013

Texas SVP jurors ignoring actuarial risk scores

Expert witness for defense makes a (small) difference, study finds

The fiery debates surrounding the validity of actuarial tools to predict violence risk begs the question: How much influence do these instruments really have on legal decision-makers? The answer, at least when it comes to jurors in Sexually Violent Predator trials in Texas:

Not much.

"Despite great academic emphasis on risk measures - and ongoing debates about the value, accuracy, and utility of risk-measure scores reported in SVP hearings - our findings suggest these risk measure scores may have little impact on jurors in actual SVP hearings."

The researchers surveyed 299 jurors at the end of 26 sexually violent predator trials. Unfortunately, they could not directly measure the relationship between risk scores and civil commitment decisions because, this being Texas, juries slam-dunked 25 out of 26 sex offenders, hanging in only one case (which ultimately ended in commitment after a retrial).

Instead of the ultimate legal outcome, the researchers had to rely on proxy outcome measures, including jurors' ratings of how dangerous an individual was (specifically, how likely he would be to commit a new sex offense within one year of release), and their assessment of how difficult it was to make a decision in their case.

There was no evidence that jurors' assessments of risk or decision difficulty varied based on respondents' scores on risk assessment tools, which in each case included the Static-99, MnSOST-R and the PCL-R. This finding, by the prolific team of Marcus Boccaccini, Daniel Murrie and colleagues, extends into the real world prior mock trial evidence that jurors in capital cases and other legal proceedings involving psychology experts are more heavily influenced by clinical than actuarial testimony.

What did make a difference to jurors was whether the defense called at least one witness, and in particular an expert witness. Overall, there was a huge imbalance in expert testimony, with almost all of the trials featuring two state experts, but only seven of 26 including even one expert called by the defense.

"Skepticism effect"

The introduction of a defense expert produced a "skepticism effect," the researchers found, in which jurors became more skeptical of experts' ability to predict future offending. However, jurors' lower risk ratings in these cases could also have been due to real differences in the cases. In SVP cases involving legitimately dangerous sex offenders, defense attorneys often have trouble finding experts willing to testify. In other words, the researchers note, "the reduced ratings of perceived risk associated with the presence of a defense expert may be due to nonrandom selection … as opposed to these defense experts' influencing jurors."

A back story here pertains to the jury pool in the Texas county in which civil commitment trials are held. All SVP trials take place in Montgomery County, a "very white community," an attorney there told me. A special e-juror selection process for SVP jurors whitens the jury pool even more, disproportionately eliminating Hispanics and African Americans. Meanwhile, many of those being referred for civil commitment are racial minorities. The potentially Unconstitutional race discrepancy is the basis for one of many current legal challenges to the SVP system in Texas.

Once a petition for civil commitment as a sexually violent predator is filed in Texas, the outcome is a fait accompli. Since the inception of the state's SVP law, only one jury has unanimously voted against civil commitment. Almost 300 men have been committed, and not a single one has been released.

Overall, the broad majority of jurors in the 26 SVP trials were of the opinion that respondents were likely to reoffend in the next year. Based on this heightened perception of risk, the researchers hypothesize that jurors may have found precise risk assessment ratings irrelevant because any risk was enough to justify civil commitment.

In a previous survey of Texas jurors, more than half reported that even a 1 percent chance of recidivism was enough to qualify a sex offender as dangerous. To be civilly committed in Texas, a sex offender must be found "likely" to reoffend, but the state's courts have not clarified what that term means.

Risk scores could also be irrelevant to jurors motivated more by a desire for retribution than a genuine wish to protect the public, the researchers pointed out. "Although SVP laws are ostensibly designed to provide treatment and protect the public, experimental research suggests that many mock jurors make civil commitment decisions based more on retributive motives - that is, the desire to punish sexual offenses—than the utilitarian goal of protecting the public…. Jurors who adopt this mindset may spend little time thinking about risk-measure scores."

All this is not to say that actuarial scores are irrelevant. They are highly influential in the decisions that take place leading up to an SVP trial, including administrative referrals for full evaluations, the opinions of the evaluators themselves as to whether an offender meets civil commitment criteria, and decisions by prosecutors as to which cases to select for trial.

"But the influence of risk scores appears to end at the point when laypersons make decisions about civilly committing a select subgroup of sexual offenders," the researchers noted.

Bottom line: Once a petition for civil commitment as a sexually violent predator is filed in Texas, it's the end of the line. The juries are ultra-punitive, and the deck is stacked, with government experts outnumbering experts called by the defense in every case. It remains unclear to what extent these results might generalize to SVP proceedings in other states with less conservative jury pools and/or more balanced proceedings.

The study, "Do Scores From Risk Measures Matter to Jurors?" by Marcus Boccaccini, Darrel Turner, Craig Henderson and Caroline Chevalier of Sam Houston State University and Daniel Murrie of the University of Virginia, is slated for publication in an upcoming issue of Psychology, Public Policy, and Law. To request a copy, email the lead researcher (HERE).

January 31, 2013

Upcoming forensic training opportunities

American Psychology-Law Society, March 7-9

For those of you planning to attend the American Psychology-Law Society conference in Portland Oregon on March 7-9, early-bird registration ends on February 1. This year’s lineup is very exciting. The conference website is HERE; details on the March 8 symposium that I will be chairing, "Emergent controversies in civil commitment evaluations of sexually violent predators” (as well as the pre-conference continuing education lineup) are HERE.

Juvenile sex offender training, Feb. 11

Coming right up at the Institute of Law, Psychiatry, and Public Policy in Charlottesville, Virginia is an advanced training on “Understanding and Treating Juvenile Sexual Offenders.” The presenter is nationally recognized expert John Hunter. Details are HERE.

Ethics in forensic practice training, May 3

Later in the spring, the Institute is offering another advanced workshop by nationally recognized presenter Alan Goldstein. Topics include competence to practice; confidentiality; roles of the expert; issues in assessment; responding to subpoenas; release of raw test data; report writing; and ultimate opinion testimony. The emphasis is on reducing the likelihood of potentially damaging cross-examination, ethics complaints and malpractice actions. Attendees are encouraged to prepare, in advance, a problem they encountered in their practice and submit it at the start of the workshop for review and possible discussion. More information is HERE.

January 27, 2013

Showdown looming over predictive accuracy of actuarials

Large error rates thwart individual risk prediction

Brett Jordan David Macdonald (Creative Commons license)

If you are involved in risk assessments in any way (and what psychology-law professional is not, given the current cultural landscape?), now is the time to get up to speed on a major challenge that's fast gaining recognition.

At issue is whether the margins of error around scores are so wide as to prevent reliable prediction of an individual's risk, even as risk instruments show some (albeit weak) predictive accuracy on a group level. If the problem is unsolvable, as critics maintain, then actuarial tools such as the Static-99 and VRAG should be barred from court, where they can literally make the difference between life and death.

The debate has been gaining steam since 2007, with a series of back-and-forth articles in academic journals (see below). Now, the preeminent journal Behavioral Sciences and the Law has published findings by two leading forensic psychologists from Canada and Scotland that purport to demonstrate once and for all that the problem is "an accurate characterization of reality" rather than a statistical artifact as the actuarials' defenders had argued.

So-called actuarial tools have become increasingly popular over the last couple of decades in response to legal demand. Instruments such as the Static-99 (for sexual risk) and the VRAG (for general violence risk) provide quick-and-dirty ways to guess at an individual's risk of violent or sexual recidivism. Offenders are scored on a set of easy-to-collect variables, such as age and number of prior convictions. The assumption is that an offender who attains a certain score resembles the larger group of offenders in that score range, and therefore is likely to reoffend at the same rate as the collective.

Responding to criticisms of the statistical techniques they used in their previous critiques, Stephen Hart of Simon Fraser University and David Cooke of Glasgow Caledonian University developed an experimental actuarial tool that worked on par with existing actuarials to separate offenders into high- and low-risk groups.* The odds of sexual recidivism for subjects in the high-risk group averaged 4.5 times that of those in the low-risk group. But despite this large average difference, the researchers established through a traditional statistical procedure, logistic regression, that the margins of error around individual scores were so large as to make risk distinctions between individuals "virtually impossible." In only one out of 90 cases was it possible to say that a subject's predicted risk of failure was significantly higher than the overall baseline of 18 percent. (See figure.)

Vertical lines show confidence intervals for individual risk estimates;
these large ranges would be required in order to reach the traditional 95 percent level of certainty.

The brick wall limiting predictive accuracy at the individual level is not specific to violence risk. Researchers in more established fields, such as medical pathology, have also hit it. Many of you will know of someone diagnosed with a cancer and given six months to live who managed to soldier on for years (or, conversely, who bit the dust in a matter of weeks). Such cases are not flukes: They owe to the fact the six-month figure is just a group average, and cannot be accurately applied to any individual cancer patient.

Attempts to resolve this problem via new technical procedures are "a waste of time," according to Hart and Cooke, because the problem is due to the "fundamental uncertainty in individual-level violence risk assessment, one that cannot be overcome." In other words, trying to precisely predict the future using "a small number of risk factors selected primarily on pragmatic grounds" is futile; all the analyses in the world "will not change reality."

Legal admissibility questionable

The current study has grave implications for the legal admissibility of actuarial instruments in court. Jurisdictions that rely upon the Daubert evidentiary standard should not be allowing procedures for which the margins of error are "large, unknown, or incalculable," Hart and Cooke warn.

By offering risk estimates in the form of precise odds of a new crime within a specific period of time, actuarial methods present an image of certitude. This is especially dangerous when that accuracy is illusory. Being told that an offender "belongs to a group with a 78 percent likelihood of committing another violent offense within seven years" is highly prejudicial and may poison the judgment of triers of fact. More covertly, it influences the judgment of the clinician as well, who -- through a process known as "anchoring bias" -- may tend to judge other information in a case in light of the individual's actuarial risk score.

Classic '56 Chevy in Cuba. Photo credit: Franciscovies

With professional awareness of this issue growing, it is not only irresponsible but ethically indefensible not to inform the courts or others who retain our services about the limitations of actuarial risk assessment. The Ethics Code of the American Psychological Association, for example, requires informing clients of "any significant limitations of [our] interpretations." Unfortunately, I rarely (if ever) see limitations adequately disclosed, either in written reports or court testimony, by evaluators who rely upon the Static-99, VRAG, Psychopathy Checklist-Revised (which Cooke and statistician Christine Michie of Glasgow University tackled in a 2010 study) and similar instruments in forming opinions about individual risk.

In fact, more often than not I see the opposite: Evaluators tout the actuarial du jour as being far more accurate than "unstructured clinical judgment." That's like an auto dealer telling you, in response to your query about a vehicle's gas mileage, that it gets far more miles per gallon than your old 1956 Chevy. Leaving aside Cuba (where a long-running U.S. embargo hampers imports), there are about as many gas-guzzling '56 Chevys on the roads in 2013 as there are forensic psychologists relying on unstructured clinical judgment to perform risk assessments.

Time to give up the ghost?

Hart and Cooke recommend that forensic evaluators stop the practice of using these statistical algorithms to make "mechanistic" and "formulaic" predictions. They are especially critical of the practice of providing specific probabilities of recidivism, which are highly prejudicial and likely to be inaccurate.

"This actually isn’t a radical idea; until quite recently, leading figures in the field of forensic mental health [such as Tom Grisso and Paul Appelbaum] argued that making probabilistic predictions was questionable or even ill advised," they point out. “Even in fields where the state of knowledge is arguably more advanced, such as medicine, it is not routine to make individual predictions.”

They propose instead a return to evidence-based approaches that more wholistically consider the individual and his or her circumstances:

From both clinical and legal perspectives, it is arbitrary and therefore inappropriate to rely solely on a statistical algorithm developed a priori - and therefore developed without any reference to the facts of the case at hand - to make decisions about an individual, especially when the decision may result in deprivation of liberties. Instead, good practice requires a flexible approach, one in which professionals are aware of and rely on knowledge of the scientific literature, but also recognize that their decisions ultimately require consideration of the totality of circumstances - not just the items of a particular test.

In the short run, I am skeptical that this proposal will be accepted. The foundation underlying actuarial risk assessment may be hollow, but too much construction has occurred atop it. Civil commitment schemes rely upon actuarial tools to lend an imprimatur of science, and statutes in an increasing number of U.S. states mandate use of the Static-99 and related statistical algorithms in institutional decision-making.

The long-term picture is more difficult to predict. We may look back sheepishly on today's technocratic approaches, seeing them as emblematic of overzealous and ignorant pandering to public fear. Or -- more bleakly -- we may end up with a rigidly controlled society like that depicted in the sci-fi drama Gattaca, in which supposedly infallible scientific tests determine (and limit) the future of each citizen.

* * * * *

I recommend the article, "Another Look at the (Im-)Precision of IndividualRisk Estimates Made Using Actuarial RiskAssessment Instruments." It's part of an upcoming special issue on violence risk assessment, and it provides a detailed discussion of the history and parameters of the debate. (Click HERE to request it from Dr. Hart.) Other articles in the debate include the following (in rough chronological order):

Hart, S. D., Michie, C. and Cooke, D. J. (2007a). Precision of actuarial risk assessment instruments: Evaluating the "margins of error" of group v. individual predictions of violence. British Journal of Psychiatry, 190, s60–s65.

Mossman, D. and Sellke, T. (2007). Avoiding errors about "margins of error" [Letter]. British Journal of Psychiatry, 191, 561.

Harris, G. T., Rice, M. E. and Quinsey, V. L. (2008). Shall evidence-based risk assessment be abandoned? [Letter]. British Journal of Psychiatry, 192, 154.

Cooke, D. J. and Michie, C. (2010). Limitations of diagnostic precision and predictive utility in the individual case: A challenge for forensic practice. Law and Human Behavior, 34, 259–274.

Hanson, R. K. and Howard, P. D. (2010). Individual confidence intervals do not inform decision makers about the accuracy of risk assessment evaluations. Law and Human Behavior, 34, 275–281.

*The experimental instrument used for this study was derived from the SVR-20, a structured professional judgment tool. The average recidivism rate among the total sample was 18 percent, with 10 percent of offenders in the low-risk group and 33 percent of those in the high-risk group reoffending. The instrument's Area Under the Curve, a measure of predictive validity, was .72, which is in line with that of other actuarial instruments.

January 20, 2013

Martin Luther King: Back by popular demand

For the third year in a row, in honor of Martin Luther King Day, I am excerpting a portion of a keynote speech the visionary civil rights leader delivered at the 1967 convention of the American Psychological Association. This was just seven months before he was gunned down, and at a time when he was drawing larger connections between racial oppression and the Vietnam War. I have made the entire speech, "The Role of the Behavioral Scientist in the Civil Rights Movement," available for download (HERE). Forensic psychologists who follow my blog have told me they appreciate it. Forty-six years old, the speech remains relevant today.

There are certain technical words in every academic discipline which soon become stereotypes and even clichés. Every academic discipline has its technical nomenclature. You who are in the field of psychology have given us a great word. It is the word maladjusted. This word is probably used more than any other word in psychology. It is a good word; certainly it is good that in dealing with what the word implies you are declaring that destructive maladjustment should be destroyed. You are saying that all must seek the well-adjusted life in order to avoid neurotic and schizophrenic personalities.

But on the other hand, I am sure that we will recognize that there are some things in our society, some things in our world, to which we should never be adjusted. There are some things concerning which we must always be maladjusted if we are to be people of good will. We must never adjust ourselves to racial discrimination and racial segregation. We must never adjust ourselves to religious bigotry. We must never adjust ourselves to economic conditions that take necessities from the many to give luxuries to the few. We must never adjust ourselves to the madness of militarism, and the self-defeating effects of physical violence....

Thus, it may well be that our world is in dire need of a new organization, The International Association for the Advancement of Creative Maladjustment. Men and women should be as maladjusted as the prophet Amos, who in the midst of the injustices of his day, could cry out in words that echo across the centuries, 'Let justice roll down like waters and righteousness like a mighty stream'; or as maladjusted as Abraham Lincoln, who in the midst of his vacillations finally came to see that this nation could not survive half slave and half free; or as maladjusted as Thomas Jefferson, who in the midst of an age amazingly adjusted to slavery, could scratch across the pages of history, words lifted to cosmic proportions, 'We hold these truths to be self evident, that all men are created equal. That they are endowed by their creator with certain inalienable rights. And that among these are life, liberty, and the pursuit of happiness.' And through such creative maladjustment, we may be able to emerge from the bleak and desolate midnight of man’s inhumanity to man, into the bright and glittering daybreak of freedom and justice.

I have not lost hope. I must confess that these have been very difficult days for me personally. And these have been difficult days for every civil rights leader, for every lover of justice and peace.

* * * * *

My longer excerpt, posted January 2011, is HERE. The full speech is HERE.