February 11, 2013

Exercise: Priming students to detect covert biases

In an eye-opening exercise in my graduate forensic psychology course, I had two groups separately analyze a sanitized forensic report. The subject of the report was a 16-year-old boy named "John" who had committed a relatively minor sex offense; the evaluation issue was treatment amenability. After independent group discussions, the two groups shared their impressions as follows:

Group A: "John has a conduct disorder and is narcissistic. His misconduct appears to be escalating. There are ominous warning signs of budding psychopathy. He is at a crossroads in his life; he could go bad fast."

Group B: "This report is biased. The evaluator has joined with John's mother, and is channeling the mother's antagonism toward John. There is evidence of racism, homophobia, and political conservatism. The evaluator’s antipathy toward John feels personal – perhaps he has a wayward teenage son?"

The two groups looked across the table at each other, flabbergasted. Some suspected a trick. "Did you really give us the same report to read?" one student queried.

Yes, everyone had read the identical report. And, in case you wondered, group selection was random; there were no baseline differences that would explain the groups' divergent opinions.

Rather, the difference was in how the two groups were primed to read the report. Their instructions:

Group A: "Read the report with the goal of trying to understand John. What makes him tick? Does he have any potential clinical diagnoses? What is your prognosis for his future?"

Group B: "Read the report with the goal of trying to understand the perspective of the report writer. Do you see any problems with his method or his analysis? If so, do they suggest any potential biases?"

This was no abstract academic exercise. Channeling John’s hateful mother, this seminal report reads like something torn from the pages of an Anne Rule novel, replete with enough (uncorroborated) animal torture and arson to excite any true believer in the infamous McDonald Triad. Going unchallenged at the time, the report had a hugely prejudicial impact on decision-makers. For years to come, institutional bureaucrats and forensic experts quoted liberally from it to bolster their opinions that John was dangerous.

This is not an isolated or unusual case. Alarmist reports like this have remarkable staying power, their uncorroborated claims taking on a life of their own as they ripple through their subjects' lives, eschewing rational analysis or contestation. The power of a single forensic evaluator is truly frightening at times.

Cutting through the hype


So how did a group of graduate students manage to see through the hype that had buffaloed seasoned professionals, to take the measure of the evaluator and expose his subterranean biases? Remarkably, all it took was a simple admonition to think critically, and to be alert to potential biases.

Ideally, we should always be exercising these analytical faculties. We should train ourselves to simultaneously process at least two units of analyses, asking ourselves both:

A. What does this report tell us about its subject?

B. What are the limitations of this report? How might its findings be unreliable, and perhaps flawed by unreliable or insufficient information, unconscious assumptions and biases, or other factors?

Cognitive biases


In the class exercise, Group A was focused only on Question A, whereas Group B focused on Question B. When forensic experts review a report, our approach should be bidirectional, and incorporate both perspectives.

Constructive skepticism benefits from an understanding of cognitive biases and how they work. In the instant case, the most obvious of these was confirmatory bias. This is the tendency to actively seek out and assign more weight to information that confirms one's prior beliefs, discounting or ignoring disconfirmatory data. Clinicians who fall under the spell of psychopathy theory, for example, tend to see psychopaths lurking behind every bush. A clue to the author's preconceptions in John's case was found in a footnote citing Stanton Samenow’s The Criminal Mind, an influential but decidedly polemic treatise that vigorously disavows social factors in crime and -- as its title implies -- caricatures criminals as a breed apart from normal human beings. 

Once you detect such selective perception in play, you may see related cognitive biases which the discerning expert should always be on the lookout for in forensic (and other) reports. These include, but are not limited to:

  • Salience bias, in which inordinate attention is paid to exotic or highly distinctive information, at the expense of ordinary features of a case that may be important. In John's case, the evaluator overweighted the mother's fanciful tales about John's early childhood ("He never cried liked a normal baby!"), while ignoring more proximate evidence of John's confusion over his sexuality. In criminal cases, salience bias often contributes to racial stereotyping.

  • Hindsight bias, or the tendency to see events as more predictable than they were before they took place. Using hindsight, forensic experts are prone to overvalue known facts that tend to explain an event; a countermeasure is to deliberately consider information that supports alternate conclusions. 

  • Availability bias, in which the probability of an event is judged by how easy it is to think of examples. Especially when combined with ignorance of base rates, this can lead to a tendency to overpredict dramatic events, even when -- as in the case of black swans -- their likelihood is actually low.

  • Illusory correlation, in which a relationship is imagined between variables that are in fact unrelated. In John's case, the mother's dramatic tales -- even if true -- may have had little or nothing to do with John's teenage misconduct. However, when read by subsequent decision-makers in a cultural climate that privileges psychopathy as an explanation for criminal conduct, they had an enormously prejudicial impact. 

(Wikipedia maintains an exhaustive list of these decision-making biases, along with links to their definitions.

To avoid perpetuating biases, forensic evaluators should train themselves to think like "Agent J" in Men in Black. Rather than jumping to superficially plausible conclusions, try to consciously develop alternate hypotheses and test their fit with the evidence. This scientific mindset kept Agent J (Will Smith) from assuming that little Tiffany, a blonde girl carrying quantum physics textbooks through the ghetto at night, was the innocent party just because she did not superficially resemble the monsters who were also out and about. Here is the scene from Men in Black that I show in my class, in which Agent J explains his logic in shooting Tiffany -- rather than the monsters -- during a simulation training:

February 7, 2013

Fremantle to host Australian forensic conference

I hope all of you Aussies out there are aware of the exciting forensic psychology conference coming up in April. The theme is timely: "The Times are a Changin': Controversies, Competencies, and DSM-5." Robert Krueger, a personality researcher at the University of Minnesota and a member of the Personality Disorders Workgroup for the DSM-5, will give a keynote focusing on issues specific to using the DSM-5 personality disorders in court. The other keynote speaker is Jane Goodman-Delahunty of Charles Sturt University, a prominent psychologist and attorney who will speak about psychological injuries from workplace harassment. The setting, for those of you who might want to travel to Australia to attend, is the western city of Fremantle, which bills itself as the best preserved 19th Century seaport in the world. (The conference alternates between eastern and western Australia; when I gave a keynote there two years ago, it was held in the idyllic resort setting of Noosa, in southern Queensland.) The website for the April 18-20 event is HERE; the full program  can be downloaded HERE. Don't procrastinate too long, as early-bird registration ends March 18.

Panorama of the Swan River Settlement (Fremantle), 1831 (Source: Wikipedia Commons)

February 5, 2013

Texas SVP jurors ignoring actuarial risk scores

Expert witness for defense makes a (small) difference, study finds

The fiery debates surrounding the validity of actuarial tools to predict violence risk begs the question: How much influence do these instruments really have on legal decision-makers? The answer, at least when it comes to jurors in Sexually Violent Predator trials in Texas:

Not much.

"Despite great academic emphasis on risk measures - and ongoing debates about the value, accuracy, and utility of risk-measure scores reported in SVP hearings - our findings suggest these risk measure scores may have little impact on jurors in actual SVP hearings."

The researchers surveyed 299 jurors at the end of 26 sexually violent predator trials. Unfortunately, they could not directly measure the relationship between risk scores and civil commitment decisions because, this being Texas, juries slam-dunked 25 out of 26 sex offenders, hanging in only one case (which ultimately ended in commitment after a retrial).  

Instead of the ultimate legal outcome, the researchers had to rely on proxy outcome measures, including jurors' ratings of how dangerous an individual was (specifically, how likely he would be to commit a new sex offense within one year of release), and their assessment of how difficult it was to make a decision in their case.

There was no evidence that jurors' assessments of risk or decision difficulty varied based on respondents' scores on risk assessment tools, which in each case included the Static-99, MnSOST-R and the PCL-R. This finding, by the prolific team of Marcus Boccaccini, Daniel Murrie and colleagues, extends into the real world prior mock trial evidence that jurors in capital cases and other legal proceedings involving psychology experts are more heavily influenced by clinical than actuarial testimony.

What did make a difference to jurors was whether the defense called at least one witness, and in particular an expert witness. Overall, there was a huge imbalance in expert testimony, with almost all of the trials featuring two state experts, but only seven of 26 including even one expert called by the defense.

"Skepticism effect"

The introduction of a defense expert produced a "skepticism effect," the researchers found, in which jurors became more skeptical of experts' ability to predict future offending. However, jurors' lower risk ratings in these cases could also have been due to real differences in the cases. In SVP cases involving legitimately dangerous sex offenders, defense attorneys often have trouble finding experts willing to testify. In other words, the researchers note, "the reduced ratings of perceived risk associated with the presence of a defense expert may be due to nonrandom selection … as opposed to these defense experts' influencing jurors."

A back story here pertains to the jury pool in the Texas county in which civil commitment trials are held. All SVP trials take place in Montgomery County, a "very white community," an attorney there told me. A special e-juror selection process for SVP jurors whitens the jury pool even more, disproportionately eliminating Hispanics and African Americans. Meanwhile, many of those being referred for civil commitment are racial minorities. The potentially Unconstitutional race discrepancy is the basis for one of many current legal challenges to the SVP system in Texas.

Once a petition for civil commitment as a sexually violent predator is filed in Texas, the outcome is a fait accompli. Since the inception of the state's SVP law, only one jury has unanimously voted against civil commitment. Almost 300 men have been committed, and not a single one has been released.

Overall, the broad majority of jurors in the 26 SVP trials were of the opinion that respondents were likely to reoffend in the next year. Based on this heightened perception of risk, the researchers hypothesize that jurors may have found precise risk assessment ratings irrelevant because any risk was enough to justify civil commitment.

In a previous survey of Texas jurors, more than half reported that even a 1 percent chance of recidivism was enough to qualify a sex offender as dangerous. To be civilly committed in Texas, a sex offender must be found "likely" to reoffend, but the state's courts have not clarified what that term means.  

Risk scores could also be irrelevant to jurors motivated more by a desire for retribution than a genuine wish to protect the public, the researchers pointed out. "Although SVP laws are ostensibly designed to provide treatment and protect the public, experimental research suggests that many mock jurors make civil commitment decisions based more on retributive motives - that is, the desire to punish sexual offenses—than the utilitarian goal of protecting the public…. Jurors who adopt this mindset may spend little time thinking about risk-measure scores."

All this is not to say that actuarial scores are irrelevant. They are highly influential in the decisions that take place leading up to an SVP trial, including administrative referrals for full evaluations, the opinions of the evaluators themselves as to whether an offender meets civil commitment criteria, and decisions by prosecutors as to which cases to select for trial.

"But the influence of risk scores appears to end at the point when laypersons make decisions about civilly committing a select subgroup of sexual offenders," the researchers noted.

Bottom line: Once a petition for civil commitment as a sexually violent predator is filed in Texas, it's the end of the line. The juries are ultra-punitive, and the deck is stacked, with government experts outnumbering experts called by the defense in every case. It remains unclear to what extent these results might generalize to SVP proceedings in other states with less conservative jury pools and/or more balanced proceedings.

  • The study, "Do Scores From Risk Measures Matter to Jurors?" by Marcus Boccaccini, Darrel Turner, Craig Henderson and Caroline Chevalier of Sam Houston State University and Daniel Murrie of the University of Virginia, is slated for publication in an upcoming issue of Psychology, Public Policy, and Law. To request a copy, email the lead researcher (HERE).

January 31, 2013

Upcoming forensic training opportunities

American Psychology-Law Society, March 7-9


For those of you planning to attend the American Psychology-Law Society conference in Portland Oregon on March 7-9, early-bird registration ends on February 1. This year’s lineup is very exciting. The conference website is HERE; details on the March 8 symposium that I will be chairing, "Emergent controversies in civil commitment evaluations of sexually violent predators” (as well as the pre-conference continuing education lineup) are HERE

Juvenile sex offender training, Feb. 11


Coming right up at the Institute of Law, Psychiatry, and Public Policy in Charlottesville, Virginia is an advanced training on “Understanding and Treating Juvenile Sexual Offenders.” The presenter is nationally recognized expert John Hunter. Details are HERE.

Ethics in forensic practice training, May 3


Later in the spring, the Institute is offering another advanced workshop by nationally recognized presenter Alan Goldstein. Topics include competence to practice; confidentiality; roles of the expert; issues in assessment; responding to subpoenas; release of raw test data; report writing; and ultimate opinion testimony. The emphasis is on reducing the likelihood of potentially damaging cross-examination, ethics complaints and malpractice actions. Attendees are encouraged to prepare, in advance, a problem they encountered in their practice and submit it at the start of the workshop for review and possible discussion. More information is HERE.

January 27, 2013

Showdown looming over predictive accuracy of actuarials

Large error rates thwart individual risk prediction
Brett Jordan David Macdonald (Creative Commons license)
If you are involved in risk assessments in any way (and what psychology-law professional is not, given the current cultural landscape?), now is the time to get up to speed on a major challenge that's fast gaining recognition.

At issue is whether the margins of error around scores are so wide as to prevent reliable prediction of an individual's risk, even as risk instruments show some (albeit weak) predictive accuracy on a group level. If the problem is unsolvable, as critics maintain, then actuarial tools such as the Static-99 and VRAG should be barred from court, where they can literally make the difference between life and death.

The debate has been gaining steam since 2007, with a series of back-and-forth articles in academic journals (see below). Now, the preeminent journal Behavioral Sciences and the Law has published findings by two leading forensic psychologists from Canada and Scotland that purport to demonstrate once and for all that the problem is "an accurate characterization of reality" rather than a statistical artifact as the actuarials' defenders had argued.

So-called actuarial tools have become increasingly popular over the last couple of decades in response to legal demand. Instruments such as the Static-99 (for sexual risk) and the VRAG (for general violence risk) provide quick-and-dirty ways to guess at an individual's risk of violent or sexual recidivism. Offenders are scored on a set of easy-to-collect variables, such as age and number of prior convictions. The assumption is that an offender who attains a certain score resembles the larger group of offenders in that score range, and therefore is likely to reoffend at the same rate as the collective.

Responding to criticisms of the statistical techniques they used in their previous critiques, Stephen Hart of Simon Fraser University and David Cooke of Glasgow Caledonian University developed an experimental actuarial tool that worked on par with existing actuarials to separate offenders into high- and low-risk groups.* The odds of sexual recidivism for subjects in the high-risk group averaged 4.5 times that of those in the low-risk group. But despite this large average difference, the researchers established through a traditional statistical procedure, logistic regression, that the margins of error around individual scores were so large as to make risk distinctions between individuals "virtually impossible." In only one out of 90 cases was it possible to say that a subject's predicted risk of failure was significantly higher than the overall baseline of 18 percent. (See figure.)

Vertical lines show confidence intervals for individual risk estimates;
these large ranges would be required in order to reach the traditional 95 percent level of certainty.

The brick wall limiting predictive accuracy at the individual level is not specific to violence risk. Researchers in more established fields, such as medical pathology, have also hit it. Many of you will know of someone diagnosed with a cancer and given six months to live who managed to soldier on for years (or, conversely, who bit the dust in a matter of weeks). Such cases are not flukes: They owe to the fact the six-month figure is just a group average, and cannot be accurately applied to any individual cancer patient.

Attempts to resolve this problem via new technical procedures are "a waste of time," according to Hart and Cooke, because the problem is due to the "fundamental uncertainty in individual-level violence risk assessment, one that cannot be overcome." In other words, trying to precisely predict the future using "a small number of risk factors selected primarily on pragmatic grounds" is futile; all the analyses in the world "will not change reality."

Legal admissibility questionable 

The current study has grave implications for the legal admissibility of actuarial instruments in court. Jurisdictions that rely upon the Daubert evidentiary standard should not be allowing procedures for which the margins of error are "large, unknown, or incalculable," Hart and Cooke warn.

By offering risk estimates in the form of precise odds of a new crime within a specific period of time, actuarial methods present an image of certitude. This is especially dangerous when that accuracy is illusory. Being told that an offender "belongs to a group with a 78 percent likelihood of committing another violent offense within seven years" is highly prejudicial and may poison the judgment of triers of fact. More covertly, it influences the judgment of the clinician as well, who -- through a process known as "anchoring bias" -- may tend to judge other information in a case in light of the individual's actuarial risk score.

Classic '56 Chevy in Cuba. Photo credit: Franciscovies
With professional awareness of this issue growing, it is not only irresponsible but ethically indefensible not to inform the courts or others who retain our services about the limitations of actuarial risk assessment. The Ethics Code of the American Psychological Association, for example, requires informing clients of "any significant limitations of [our] interpretations." Unfortunately, I rarely (if ever) see limitations adequately disclosed, either in written reports or court testimony, by evaluators who rely upon the Static-99, VRAG, Psychopathy Checklist-Revised (which Cooke and statistician Christine Michie of Glasgow University tackled in a 2010 study) and similar instruments in forming opinions about individual risk.

In fact, more often than not I see the opposite: Evaluators tout the actuarial du jour as being far more accurate than "unstructured clinical judgment." That's like an auto dealer telling you, in response to your query about a vehicle's gas mileage, that it gets far more miles per gallon than your old 1956 Chevy. Leaving aside Cuba (where a long-running U.S. embargo hampers imports), there are about as many gas-guzzling '56 Chevys on the roads in 2013 as there are forensic psychologists relying on unstructured clinical judgment to perform risk assessments. 

Time to give up the ghost? 

Hart and Cooke recommend that forensic evaluators stop the practice of using these statistical algorithms to make "mechanistic" and "formulaic" predictions. They are especially critical of the practice of providing specific probabilities of recidivism, which are highly prejudicial and likely to be inaccurate.

"This actually isn’t a radical idea; until quite recently, leading figures in the field of forensic mental health [such as Tom Grisso and Paul Appelbaum] argued that making probabilistic predictions was questionable or even ill advised," they point out. “Even in fields where the state of knowledge is arguably more advanced, such as medicine, it is not routine to make individual predictions.”

They propose instead a return to evidence-based approaches that more wholistically consider the individual and his or her circumstances:

From both clinical and legal perspectives, it is arbitrary and therefore inappropriate to rely solely on a statistical algorithm developed a priori - and therefore developed without any reference to the facts of the case at hand - to make decisions about an individual, especially when the decision may result in deprivation of liberties. Instead, good practice requires a flexible approach, one in which professionals are aware of and rely on knowledge of the scientific literature, but also recognize that their decisions ultimately require consideration of the totality of circumstances - not just the items of a particular test. 

In the short run, I am skeptical that this proposal will be accepted. The foundation underlying actuarial risk assessment may be hollow, but too much construction has occurred atop it. Civil commitment schemes rely upon actuarial tools to lend an imprimatur of science, and statutes in an increasing number of U.S. states mandate use of the Static-99 and related statistical algorithms in institutional decision-making.

The long-term picture is more difficult to predict. We may look back sheepishly on today's technocratic approaches, seeing them as emblematic of overzealous and ignorant pandering to public fear. Or -- more bleakly -- we may end up with a rigidly controlled society like that depicted in the sci-fi drama Gattaca, in which supposedly infallible scientific tests determine (and limit) the future of each citizen.

* * * * *

I recommend the article, "Another Look at the (Im-)Precision of IndividualRisk Estimates Made Using Actuarial RiskAssessment Instruments." It's part of an upcoming special issue on violence risk assessment, and it provides a detailed discussion of the history and parameters of the debate. (Click HERE to request it from Dr. Hart.) Other articles in the debate include the following (in rough chronological order): 
  • Hart, S. D., Michie, C. and Cooke, D. J. (2007a). Precision of actuarial risk assessment instruments: Evaluating the "margins of error" of group v. individual predictions of violence.  British Journal of Psychiatry, 190, s60–s65. 
  • Mossman, D. and Sellke, T. (2007). Avoiding errors about "margins of error" [Letter]. British Journal of Psychiatry, 191, 561. 
  • Harris, G. T., Rice, M. E. and Quinsey, V. L. (2008). Shall evidence-based risk assessment be abandoned? [Letter]. British Journal of Psychiatry, 192, 154. 
  • Cooke, D. J. and Michie, C. (2010). Limitations of diagnostic precision and predictive utility in the individual case: A challenge for forensic practice. Law and Human Behavior, 34, 259–274. 
  • Hanson, R. K. and Howard, P. D. (2010). Individual confidence intervals do not inform decision makers about the accuracy of risk assessment evaluations. Law and Human Behavior, 34, 275–281. 
*The experimental instrument used for this study was derived from the SVR-20, a structured professional judgment tool. The average recidivism rate among the total sample was 18 percent, with 10 percent of offenders in the low-risk group and 33 percent of those in the high-risk group reoffending. The instrument's Area Under the Curve, a measure of predictive validity, was .72, which is in line with that of other actuarial instruments.

January 20, 2013

Martin Luther King: Back by popular demand

For the third year in a row, in honor of Martin Luther King Day, I am excerpting a portion of a keynote speech the visionary civil rights leader delivered at the 1967 convention of the American Psychological Association. This was just seven months before he was gunned down, and at a time when he was drawing larger connections between racial oppression and the Vietnam War. I have made the entire speech, "The Role of the Behavioral Scientist in the Civil Rights Movement," available for download (HERE). Forensic psychologists who follow my blog have told me they appreciate it. Forty-six years old, the speech remains relevant today.

There are certain technical words in every academic discipline which soon become stereotypes and even clichés. Every academic discipline has its technical nomenclature. You who are in the field of psychology have given us a great word. It is the word maladjusted. This word is probably used more than any other word in psychology. It is a good word; certainly it is good that in dealing with what the word implies you are declaring that destructive maladjustment should be destroyed. You are saying that all must seek the well-adjusted life in order to avoid neurotic and schizophrenic personalities.

But on the other hand, I am sure that we will recognize that there are some things in our society, some things in our world, to which we should never be adjusted. There are some things concerning which we must always be maladjusted if we are to be people of good will. We must never adjust ourselves to racial discrimination and racial segregation. We must never adjust ourselves to religious bigotry. We must never adjust ourselves to economic conditions that take necessities from the many to give luxuries to the few. We must never adjust ourselves to the madness of militarism, and the self-defeating effects of physical violence....

Thus, it may well be that our world is in dire need of a new organization, The International Association for the Advancement of Creative Maladjustment. Men and women should be as maladjusted as the prophet Amos, who in the midst of the injustices of his day, could cry out in words that echo across the centuries, 'Let justice roll down like waters and righteousness like a mighty stream'; or as maladjusted as Abraham Lincoln, who in the midst of his vacillations finally came to see that this nation could not survive half slave and half free; or as maladjusted as Thomas Jefferson, who in the midst of an age amazingly adjusted to slavery, could scratch across the pages of history, words lifted to cosmic proportions, 'We hold these truths to be self evident, that all men are created equal. That they are endowed by their creator with certain inalienable rights. And that among these are life, liberty, and the pursuit of happiness.' And through such creative maladjustment, we may be able to emerge from the bleak and desolate midnight of man’s inhumanity to man, into the bright and glittering daybreak of freedom and justice.

I have not lost hope. I must confess that these have been very difficult days for me personally. And these have been difficult days for every civil rights leader, for every lover of justice and peace.

* * * * *

My longer excerpt, posted January 2011, is HERE.  The full speech is HERE.