IN THE NEWS: risk assessment

Showing posts with label risk assessment. Show all posts

December 16, 2012

Training: Controversies in sexually violent predator evaluations

I am excited to announce that the American Psychology-Law Society has accepted a panel that I put together on "Emergent controversies in civil commitment evaluations of sexually violent predators." I hope some of you will join me at the annual conference in Portland, Oregon on March 7-9.

The symposium will address three areas of controversy in the sex offender civil commitment field:

Mental abnormality and psychiatric diagnosis in court (my topic)

Recidivism risk assessment (addressed by my esteemed colleague Jeffrey Singer)

Volitional control (Frederick Winsmann, clinical instructor at Harvard Medical School, will present a promising new assessment model)

Here's the symposium abstract:

Over the past three decades, Sexually Violent Predator litigation has emerged as perhaps the most contentious area of forensic psychology practice. In an effort to assist the courts, a cadre of experts has proffered a confusing array of constantly changing assessment methods, psychiatric diagnoses, and theories of sex offending. Now, some federal and state courts are beginning to subject these often-competing claims to greater scrutiny, for example via Daubert and Frye evidentiary hearings. This symposium will alert forensic practitioners, lawyers and academics to some of the most prominent minefields on the SVP battleground, revolving around three central areas of contestation: psychiatric diagnosis, risk assessment, and the elusive construct of volitional control. The presenters will review recent scholarly literature and court rulings addressing: (1) the reliability and validity of psychiatric diagnoses in sexually dangerous person litigation, (2) forensic risk assessment tools and how risk data should be reported to triers of fact, and (3) how best to address the issue of volitional impairment, a Constitutionally required element for civil commitment. The focus will be on how to assist the courts while remaining within the limits of scientific knowledge and our profession's ethical boundaries.

The conference schedule hasn't been issued yet so I don’t know which day our panel is presenting, but I will keep you posted when I find out, probably in January. In the meantime, if you are looking to pick up Continuing Education (CE) credits, the pre-conference workshops are a good way to get some high-quality forensic training:

The ever-informative Randy Otto on "Improving Clinical Judgment and Decision Making in Forensic Psychological Evaluation," with a heavy focus on identifying and reducing bias (full-day workshop)

Paul J. Frick on "Developmental Pathways to Conduct Disorder: Implications for Understanding and Treating Severely Aggressive and Antisocial Youth" (full-day workshop)

Amanda Zelechoski on "Trauma-Informed Care in Forensic Settings" (full-day workshop)

Kathy Pezdek on "How to Present Statistical Information to Judges and Jurors" (half-day workshop)

Steven Penrod on "Things That Jurors (and Judges) Ought to Know About Eyewitness Reliability" (half-day workshop)

Portland is a lovely city, especially in the spring, so register now, and mark your calendars for what is sure to be a lively and educational event.

October 18, 2012

Static-99R risk estimates wildly unstable, developers admit

The developers of the widely used Static-99R risk assessment tool for sex offenders have conceded that the instrument is not accurate in providing numerical estimates of risk for sexual recidivism for any specific offender.

The startling admission was published in the current issue of Criminal Justice and Behavior.

Examining the data from the 23 separate groups (totaling 8,106 offenders) that cumulatively make up the instrument’s aggregate norms, the researchers found alarmingly large variability in risk estimates depending on the underlying sample. The problem was especially acute for offenders with higher risk scores. A few examples:

At a low Static-99R score of "2," an offender’s predicted sexual recidivism rate after 10 years ranged from a low of 3 percent to a high of 20 percent, depending on the sample.

A score of "5" led to a recidivism estimate after five years of 10 percent in a large, representative sample of Swedish sex offenders, but a 250 percent higher risk, of 25 percent, in one U.S. sample. The absolute differences for more extreme scores were even larger.

Conversely, the Static-99R score that would predict a 15 percent likelihood of recidivism after five years ranged from a low-risk score of "2" to a high-risk score of "8," an enormous difference (greater than two standard deviations).

The study’s authors -- Karl Hanson, Leslie Helmus, David Thornton, Andrew Harris and Kelly Babchishin -- concede that such large variability in risk estimates "could lead to meaningfully different conclusions concerning an offender’s likelihood of recidivism."

Overall risk lower than previously found

Despite the wide variations in rates of offending, the absolute recidivism rate for the typical sex offender in the combined samples was low overall. The rate of recidivism among typical sex offenders after five years was only 7 percent or less (with a range of 4 to 12 percent), lower than had been reported in a previous meta-analysis. The 10-year risk ranged from 6 to 22 percent for the typical offender.

The research team speculates that the risk inflation in earlier analyses may have been an artifact of characteristics of the underlying samples, with data from higher-risk offenders more likely to be preserved and available for study. We know that a sister instrument, the MnSOST-R, produced inflated estimates of risk due to oversampling of high-risk offenders.

Will risk inflation continue?

MC Escher, "Hand with Reflecting Sphere"

The Static-99R has a very modest ability to discriminate recidivists from non-recidivists. Its so-called "Area Under the Curve" statistic of around .70 means that, if you were to randomly select one known recidivist and one non-recidivist from a group of offenders, there is about a 70 percent probability that someone who will reoffend will have a higher score than someone who won’t.

Such information about a test’s relative accuracy may be helpful when one is choosing which method to employ in doing a risk assessment. But there are a number of problems with relying on it when reporting one's assessment of a specific individual.

First of all, even that level of reliability may be illusory. A study that is currently in progress is finding poor inter-rater agreement on scores in routine practice, especially at the higher risk levels.

Second, with base rates of recidivism hovering around 6 to 7 percent, even under optimal conditions it is very difficult to accurately predict who will reoffend. For every person correctly flagged as a recidivist based on a high Static-99R score, at least three non-recidivists will be falsely flagged, according to research by Jay Singh and others, as well as published error-rate calculations by forensic psychologists Gregory DeClue and Terence Campbell.

Finally, and perhaps most importantly, telling a judge or jury how an offender compares with other offenders does not provide meaningful information about the offender’s actual risk. Indeed, such testimony can be highly misleading. For example, told that "Mr. Smith scored in the 97th percentile," judges and jurors may understandably believe this to be an estimate of actual risk, when the less frightening reality is that the person's odds of reoffending are far, far lower (probably no greater than 16 percent), even if he scores in the high-risk range. Seeing such statements in reports always flashes me back to a slim little treatise that was required reading in journalism school, How to Lie With Statistics.

Rather, what the trier of fact needs is a well calibrated test, such that predicted probabilities of recidivism match up with actual observed risk. The newly developed MnSOST-3 is promising in that regard, at least for offenders in Minnesota, where it was developed. In contrast, the popular Static-99 tools have always overestimated risk.

When the Static-99 premiered, it featured a single table of misleadingly precise risk figures. High scorers were predicted to reoffend at a rate of 52 percent after 15 years, which made it easy for government evaluators to testify that an offender with a high score met the legal criteria required for civil commitment of being "likely" to reoffend.

The instrument’s developers now admit that this original risk table "turned out to be a gross simplification."

Indeed, with each of a series of new iterations over the past few years, the Static-99's absolute risk estimates have progressively declined, such that it would be difficult for the instrument to show high enough risk to support civil detention in most cases. However, in 2009 the developers introduced a new method that can artificially inflate risk levels by comparing an offender not to the instrument's aggregate norms, but to a specially created "high risk" subsample (or "reference group") with unusually high recidivism rates.

Some evaluators are using this method on any offender who is referred for possible civil commitment. For example, I was just reviewing the transcript of a government expert's testimony that he uses these special high-risk norms on offenders who are referred for "an administrative or judicial process." In some cases, this amounts to heaping prejudice upon prejudice. Let's suppose that an offender is referred in a biased manner, due to his race or sexual orientation (something that happens far more often than you might think, and will be the topic of a future blog post). Next, based solely on this referral, this individual's risk level is calculated using recidivism rates that are guaranteed to elevate his risk as compared with other, run-of-the-mill offenders. This method has not been peer reviewed or published, and there is no evidence to support its reliability or validity. Thus, it essentially amounts to the claim that the offender in question is at an especially high risk as compared with other offenders, just "because I (or we) say so."

The admission of poor stability across samples should make it more difficult to claim that this untested procedure -- which assumes some level of commonality between the selected reference group and the individual being assessed -- is sufficiently accurate for use in legal proceedings. Given some of the sketchy practices being employed in court, however, I am skeptical that this practice will be abandoned in the immediate future.

The article is: "Absolute recidivism rates predicted by Static-99R and Static-2002R sex offender risk assessment tools vary across samples: A meta-analysis" by Leslie Helmus, R. Karl Hanson, David Thornton, Kelly M. Babchishin and Andrew J. R. Harris. Click HERE to request a copy from Dr. Hanson.

October 4, 2012

Long-awaited HCR-20 update to premiere in Scotland

The long-awaited international launch of the third version of the popular HCR-20 violence risk assessment instrument has been announced for next April in Edinburgh, Scotland.

The HCR-20 is an evidence-based tool using the structured professional judgment method, an alternative to the actuarial method that predicts violence at least as well while giving a more nuanced and individualized understanding. It has been evaluated in 32 different countries and translated into 18 languages.

A lot has changed in the world of risk prediction since the second edition premiered 15 years ago. Perhaps the major change in the third edition is the elimination of the need to incorporate a Psychopathy Checklist (PCL-R) score; research determined that this did not add to the instrument's predictive validity. Additionally, like the sister instrument for sex offender risk assessment, the RSVP, the HCR:V3 will focus more heavily on formulating plans to manage and reduce a person's risk, rather than merely predicting violence.

The revision process took four years, with beta testing in England, Holland, Sweden and Germany. Initial reports show very high correlations with the second edition of the HCR-20, excellent interrater reliability, and promising validity as a violence prediction tool.

The HCR:V3 will be launched at a one-day conference jointly organized by The Royal Society of Edinburgh and Violence Risk Assessment Training. Developers Christopher Webster, Stephen Hart and Kevin Douglas will be on hand to describe the research on the new instrument and its utility in violence risk assessment.

More information on the April 15, 2013 training conference is available HERE. A Webinar PowerPoint on the revision process is HERE.

August 2, 2012

Violence risk instruments overpredicting danger

Tools better at screening for low risk than pinpointing high risk

The team of Seena Fazel and Jay Singh are at it again, bringing us yet another gigantic review of studies on the accuracy of the most widely used instruments for assessing risk of violence and sexual recidivism.

This time, the prolific researchers -- joined by UK statistician Helen Doll and Swedish professor Martin Grann -- report on a total of 73 research samples comprising 24,847 people from 13 countries. Cumulatively, the samples had a high base rate of reoffense, with almost one in four reoffending over an average of about four years.

Bottom line: Risk assessment instruments are fairly good at identifying low risk individuals, but their high rates of false positives -- people falsely flagged as recidivists -- make them inappropriate “as sole determinants of detention, sentencing, and release.”

In all, about four out of ten of those individuals judged to be at moderate to high risk of future violence went on to violently offend. Prediction of sexual reoffense was even poorer, with less than one out of four of those judged to be at moderate to high risk going on to sexually offend. In samples with lower base rates, the researchers pointed out, predictive accuracy will be even poorer.

What that means, in practical terms, is that to stop one person who will go on to become violent again in the future, society must lock up at minimum one person who will NOT; for sex offenders, at least three non-recidivists must be detained for every recidivist. This, of course, is problematic from a human rights standpoint.

Another key finding that goes against conventional wisdom was that actuarial instruments that focus on historical risk factors perform no better than tools based on clinical judgment, a finding contrary to some previous review.

The researchers included the nine most commonly used risk assessment tools, out of the many dozens that have now been developed around the world:

Level of Service Inventory-Revised (LSI-R)
Psychopathy Checklist-Revised (PCL-R)
Sex Offender Risk Appraisal Guide (SORAG)
Static-99
Violence Risk Appraisal Guide (VRAG)
Historical, Clinical, Risk management-20 (HCR-20)
Sexual Violence Risk-20 (SVR-20)
Spousal Assault Risk Assessment (SARA)
Structured Assessment of Violence Risk in Youth (SAVRY)

Team leader Fazel, of Oxford University, and colleagues stressed several key implications of their findings:

One implication of these findings is that, even after 30 years of development, the view that violence, sexual, or criminal risk can be predicted in most cases is not evidence based. This message is important for the general public, media, and some administrations who may have unrealistic expectations of risk prediction for clinicians.

A second and related implication is that these tools are not sufficient on their own for the purposes of risk assessment. In some criminal justice systems, expert testimony commonly uses scores from these instruments in a simplistic way to estimate an individual’s risk of serious repeat offending. However, our review suggests that risk assessment tools in their current form can only be used to roughly classify individuals at the group level, and not to safely determine criminal prognosis in an individual case.

Finally, our review suggests that these instruments should be used differently. Since they had higher negative predictive values, one potential approach would be to use them to screen out low risk individuals. Researchers and policy makers could use the number safely discharged to determine the potential screening use of any particular tool, although its use could be limited for clinicians depending on the immediate and service consequences of false positives.

A further caveat is that specificities were not high -- therefore, although the decision maker can be confident that a person is truly low risk if screened out, when someone fails to be screened out as low risk, doctors cannot be certain that this person is not low risk. In other words, many individuals assessed as being at moderate or high risk could be, in fact, low risk.

The study, Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24,827 people: systematic review and meta-analysis, is published in the British Medical Journal and is freely available online (HERE).

My blog post on these researchers' previous meta-analytic study, Violence risk meta-meta: Instrument choice does matter, is HERE.

May 29, 2012

SVP risk tools show 'disappointing' reliability in real-world use

Rater agreement on three instruments commonly used to assess sex offenders' risk of recidivism is much lower in practice than reported in the tools' manuals, according to a new study out of Florida.

Faring most poorly was the Psychopathy Checklist (PCL-R). Correlations of scores between two evaluators hired by the same agency were in the low range. On average, psychologists differed by five points on the instrument, which has a score range of of zero to 40. In one case, two evaluators were apart by a whopping 24 points!

Agreement among evaluators was only moderate on the Static-99 and the MnSOST-R, two actuarial risk assessment instruments for which scoring is relatively more straightforward.

The study, published in the respected journal Psychological Assessment, was a collaboration between scholars from the Department of Mental Health Law and Policy at the University of South Florida and researchers with the Florida Department of Children and Families. It utilized archived records culled from the almost 35,000 individuals screened for possible Sexually Violent Predators (SVP) civil commitment in Florida between 1999 and 2009. The researchers located 315 cases in which the same individual was evaluated by separate clinicians who each administered both the PCL-R and at least one of the two actuarial measures within a short enough time frame to enable direct scoring comparisons.

It would be a mistake to lean too heavily on the results of a single isolated study. But the present study adds to a burgeoning body from several groups of independent researchers, all pointing to troubling problems with the accuracy of instruments designed to forecast risk of recidivism among sex offenders.

Related study: Psychopathy and sexual deviance not predictive

Collectively, the research has been especially critical of the ability of the highly prejudicial construct of psychopathy to add meaningfully to risk prediction in this high-stakes arena. Indeed, just this week another study has come out indicating that neither psychopathy scores nor sexual deviance measures improve on the accuracy provided by an actuarial instrument alone.

An especially interesting finding of that Canadian study is that reoffense rates were still below 12 percent over a 6-year followup period for even the most high-risk offenders -- those with high risk ratings on the Static-99R plus high levels of psychopathy and sexual deviance (as measured by phallometric testing). This makes it inappropriate to inflate risk estimates over and above those derived from Static-99R scores alone, the authors caution.

Item-level analysis finds varying rates of accuracy

A unique contribution of the Florida study is its analysis of the relative accuracy of every single item in each of the three instruments studied. Handy tables allow a forensic practitioner to see which items have the poorest reliability, meaning they should be viewed skeptically by forensic decision-makers.

For example, take the MnSOST-R, a now-defunct instrument with a score range of –14 to 31 points. The total gap between evaluators was as wide as 19 points; the items with the greatest variability in scoring were those pertaining to offenders' functioning during incarceration, such as participation in treatment.

Meanwhile, the weak performance of the Psychopathy Checklist owes much to the items on its so-called “Factor 1,” which attempt to measure the personality style of the psychopath. As I've discussed before, rating someone as “glib,” “callous” or “shallow” is a highly subjective enterprise that opens the door to a veritable avalanche of personal bias.

Piggy-backing off a recommendation by John Edens and colleagues, the Florida team suggests that the prejudicial deployment of the Psychopathy Checklist may be superfluous, in that scores on Factor 2 alone (the items reflecting a chronic criminal lifestyle) are more predictive of future violence or sexual recidivism.

Next up, we need to identify the causes of the poor interrater reliability for forensic risk prediction instruments in real-world settings. Is it due to inadequate training, differing clinical skills, variable access to collateral data, intentional or unintentional bias on the part of examiners, adversarial allegiance effects (not a factor in the present study, since both evaluators were appointed by the same agency), or some combination?

In the meantime, the fact that two evaluators working on the same side cannot reliably arrive at the same risk rating for any particular individual should certainly raise our skepticism about the validity of risk prediction based on these instruments.

The studies are:

Reliability of Risk Assessment Measures Used in Sexually Violent Predator Proceedings. Cailey Miller, Eva Kimonis, Randy Otto, Suzonne Kline and Adam Wasserman. Psychological Assessment. Published online 7 May 2012. Advance online publication. Click HERE to contact the authors.

Does Consideration of Psychopathy and Sexual Deviance Add to the Predictive Validity of the Static-99R? Jan Looman, Nicola A. C. Morphett and Jeff Abracen. International Journal of Offender Therapy and Comparative Criminology. Published online 28 May 2012. Click HERE to contact the authors.

Inter-rater reliability of the PCL-R total and factor scores among psychopathic sex offenders: Are personality features more prone to disagreement than behavioral features? John Edens, Marcus Boccaccini and D. W. Johnson. 2010. Behavioral Sciences and the Law, 28 (1), 106–119.

May 2, 2012

The homicidal triad: Predictor of violence or urban myth?

For at least half a century, legend has told of a "triad" of ominous childhood behaviors -- cruelty to animals, firesetting, and enuresis – said to predict future violence.

The so-called "Macdonald triad" (also known as the homicidal triad or the Hellman and Blackman triad) is taught in criminology and psychology courses, used by forensic practitioners in assessing risk, and has even made its way into Law and Order: Special Victims Unit. Especially, it’s become a staple among aficionados of the trendy serial killer.

But is the syndrome valid?

Providing the most definitive exploration to date is Kori Ryan, a former criminology student at the California State University, Fresno who delved into the "evolutionary history" of this tantalizing construct for her as-yet unpublished master's thesis. Her ultimate conclusion:

Even though the literature on violent behavior contains many references to the Macdonald triad (and its aliases), collectively these studies do not provide sufficient evidence of its ability to predict violence, nor, in fact, of its existence as a bona fide phenomenon.

Instead, childhood enuresis, firesetting and animal cruelty more likely represent three among many indicators of severe childhood abuse. In other words, the presence of one or more of these elements in the histories of some violent offenders can be explained by the fact that violent offenders are often the products of child abuse. More importantly, relying upon these behaviors as predictors of future violence would lead to many false positives, punishing children who might not be violent in the future.


One of many misleading websites

Roots of the legend

Gulliver's Travels

Forensic psychiatrist John Macdonald is generally credited with "discovering" the triad. In a 1963 article in the American Journal of Psychiatry, entitled "The Threat to Kill," he gave his clinical impression that "a history of great parental brutality, extreme maternal seduction, or the triad of childhood firesetting, cruelty to animals and enuresis" can signal those who will eventually threaten homicide. His article was based on his work with 100 patients at the Colorado Psychopathic Hospital in Denver, Colorado who had threatened -- but not necessarily committed -- violence.

Over the next few decades, the idea "attracted a dedicated following" and gradually expanded to encompass various forensic groups, including sexual sadists, recidivist firesetters and -- most salacious -- serial killers.

Ryan traces the history of cultural interest in these behaviors all the way back to Greek mythology and early Western fiction, such as Jonathan Swift's 1726 Gulliver's Travels, in which Gulliver puts out a fire with his own urine, much to the chagrin of the Imperial Majesty, thereby linking urination with fire and revenge.

Early psychoanalytic thinkers also placed heavy emphasis on these behaviors, seeing them as products of arrested psychosexual development and sublimated sexual and sadistic urges. Psychoanalyst Melanie Klein, for example, saw bedwetting as a daughter’s sadistic revenge against her mother.

Empirical research: Triad goes bust

Two psychiatrists were the first to empirically evaluate the Macdonald triad, according to Ryan. Studying 84 incarcerated offenders in 1966, Hellman and Blackman reported a positive association between the triad and future violence. Accordingly, some took to labeling the phenomenon as the “"Hellman and Blackman triad."

But subsequent attempts to replicate Hellman and Blackman's findings were unsuccessful. Even John Macdonald himself voiced later doubt about the triad's validity. After trying to test his own clinical theory, Macdonald reported in his 1968 book, Homicidal Threats, that he could find no statistically significant association between homicide perpetrators and early problems with firesetting, cruelty to animals, or enuresis.

Likewise, in an examination of 206 sex offenders at the Massachusetts Treatment Center for Sexual Dangerous Persons, Prentky and Carter (1984) found "no compelling evidence" for the idea that the triad predicted adult criminality. They did, however, note that the individual components of the triad were common among people raised in highly abusive home environments.

Some years later, this was also the conclusion of Jonathan Pincus, in his 2001 book on convicted murderers. Pincus described "a forensic assessment protocol in which bed-wetting, firesetting, and cruelty to animals (among other behaviors) are considered 'hallmarks' of childhood abuse," notes Ryan.

Indeed, it seems far more likely that one of Macdonald’s five original indicators that didn’t go on to fame has more explanatory power as a cause of later violence: parental brutality.

Dangerous ramifications

"The frequency with which discussions of violent offenders (of various types) include mention of the Macdonald triad suggests its general acceptance as a predictor of violent behavior," notes Ryan.

This continuing prominence owes in large part to the triad's promotion by prominent FBI profilers in the 1988 book, Sexual Homicide: Patterns and Motives. Like Macdonald’s, the FBI study was anecdotal, small-scale and lacking in any statistical analyses or control groups. Studying 36 sex killers, Douglas, Burgess and Ressler found that many manifested one or more elements of the triad. Unfortunately, notes Ryan, the authors did not report which factors were present in which subjects, or how many of these killers evidenced all three components of the triad.

Ryan warns that promotion of the triad has real-world ramifications, in that children who exhibit one or more of these behaviors "might be falsely labeled as potentially dangerous."

For example, police officers exposed to the triad in undergraduate criminology courses may target young offenders who have lit a fire or harmed an animal -- both fairly common behaviors among troubled youth -- as future sex fiends or serial killers. (Enuresis, with less face validity as an indicator of sadism, has tended to drop from more contemporary renditions of the triad.)

Ignoring the miniscule base rate of serial killers, even veterinarians are encouraged to identify those who hurt pet animals as potentially lethal: "Many known serial killers began their careers by hurting pet animals," warn the authors of a 2004 article in one veterinary journal. "It is well known in the criminology field that people who perpetrate acts of cruelty on animals, frequently escalate to torturing humans, usually the young and helpless."

Rather than throwing the baby out with the bathwater, Ryan says researchers could do more research to understand these behaviors in context. For example, might arson be a coping mechanism in children who have experienced severe emotional abuse, rather than a marker for future aggression? Are some elements of the triad indicators for future violence when they co-occur? More fundamentally, is there any set of behaviors that can legitimately be considered a behavioral syndrome predictive of later violence?

The study is: The Macdonald triad: Predictor of violence or urban myth? The abstract is HERE; the full text can be requested from the author via ResearchGate (HERE). The author, Kori Ryan, can be contacted HERE.*

*Links updated 12/1/16.

March 26, 2012

'Case of the missing militant' resolved


Attorney Paul Harris reads from To Kill A Mockingbird.* Photo credit: San Jose Mercury

A quick update on the case of Ronald Bridgeforth, the man I blogged about who turned himself in on shooting charges after 42 years underground: A judge in San Mateo County imposed a very reasonable sentence of one year in county jail. The judge also ordered Bridgeforth to work with at-risk youth in Alameda County (Oakland), California upon his release. That should be no problem for the 67-year-old former militant, who has dedicated his life to public service.

My original post, Predicting behavior: The case of the missing militant, is HERE.The San Mateo Times and The Daily Mail (UK) have more on the sentencing. A San Jose Mercury slide show is HERE.

*I don't know what passage from To Kill A Mockingbird the defense attorney was reading from at the sentencing hearing, but I am curious.

February 21, 2012

Treatment and risk among the most dangerous sex offenders

Study questions need for lengthy treatment of detainees

McNeil Island with prison ferry in foreground

McNeil Island is a lonesome place these days. In a cost-saving moving, the state of Washington has shuttered the prison. The McNeil Island Correctional Center was the last of its kind, the twin sister of the more infamous Alcatraz Penitentiary in the San Francisco Bay.

Back when I briefly worked there in the late 1990s, it was a rustic place, its forests and overgrown orchards teeming with deer and other wildlife. Now, it is dominated by a modern civil detention site housing about 284 sex offenders. Built at a cost of $60 million, the Special Commitment Center costs another $133 million* per year to run, at a time of massive cuts to essential public services.

Special Commitment Center (photo credit: Seattle Times)

Although Washington holds the distinction of housing civil detainees on a remote island that can only be reached by air or water, the state's larger quandary is not unique. Swept along by public panics and political posturing, 20 U.S. states have approved civil detention programs that are becoming costly albatrosses.

The 30 other U.S. states, as well as other countries around the world, are in a position to ridicule the obscenely high costs of indefinitely quarantining such small handfuls of offenders.

Our neighbors to the north are far more sensible, as it turns out. At the Regional Treatment Centre (RTC) in Kingston, Ontario, Canada, civil commitment is nonexistent, and the highest-risk sex offenders may be released after an average of just seven months of treatment.

And how many of those bad actors go on to sexually reoffend after their brief but intensive treatment?

Fewer than 6 percent, according to a new study. Although the study's 2.5-year follow-up period is relatively short, the findings echo those of a previous study by co-author Jeffrey Abracen and colleagues, finding that even after nine years, only about 10 percent of offenders released from the RTC had reoffended.

Comparing high-risk Canadian sex offenders with similarly dangerous offenders civilly committed in the U.S. state of Florida, the researchers found the two populations to be virtually identical. Of the 31 sex offenders released in Florida, only one (or 3 percent) sexually reoffended. Because so few sex offenders are being released from civil detention sites in the United States, it is difficult to accurately estimate how many of them might reoffend in the community; this study could help to fill this gap, by providing a proxy group.

The low recidivism rates in Canada after only brief treatment suggests that the interminable treatment regimens at U.S. civil commitment sites, which typically last for years and years, are "more cultural than practical," reflecting the U.S. propensity for severe punishment, according to the study's authors, Robin Wilson and Donald Pake Jr. of Florida and Jan Looman and Jeffrey Abracen of Canada. One downside of such interminable treatment is that offenders may become institutionalized, with negative affects on their personalities, the authors suggest.

The researchers highlighted the fact that despite being among the highest-risk sex offenders from their respective prison systems, both the Canadian and U.S. offenders reoffended at rates far below those predicted by the Static-99 and Static-99R, the most widely used actuarial instruments for predicting recidivism.

These researchers are not the only ones coming to the conclusion that the actuarial instruments drastically overpredict recidivism. In the state of Virginia, lawmakers are questioning the use of the Static-99 after noting that civil commitment recommendations shot up when the state began mandating use of the Static-99 in 2006, jumping from about 7 percent to 25 percent of all sex offenders being released from prison.

"When the test was designated in law in 2006, it was believed that a score of 5 meant that the offender was 32 percent likely to commit another sex crime," according to a news report. "Updates have brought that risk down to about 11 percent. Researchers say that even may be too high."

Echoing what many of us have been saying for several years now, a study by Virginia's Joint Legislative Audit and Review Commission, the investigative arm of that state's General Assembly, concluded that the Static-99 is not all that accurate for assessing the risk of specific individuals, as opposed to groups.

Rather than scrapping the civil commitment program altogether, and saving themselves a cool $23 million per year, the first state to mandate the Static-99 almost did a 180 to become the first state to scrap its use altogether. Proposed legislation would have entirely "eliminate[d] the use of the Static-99 assessment instrument" for civil commitment purposes. For some reason, though, that language was removed from the most current version of House Bill 1271.

Stay tuned. As more solid research begins to overtake the hype, these and other political skirmishes are likely to become more common in financially desperate states. Eventually, I predict the entire civil commitment enterprise will hit the scrap pile as did the old sexual psychopath laws of the 1950s, but not before 20 U.S. states and the federal government squander many, many more millions of public dollars.

The study is: Comparing Sexual Offenders at the Regional Treatment Centre (Ontario) and the Florida Civil Commitment Center by Robin Wilson, Jan Looman, Jeffrey Abracen and Donald Pake Jr., forthcoming from the International Journal of Offender Therapy and Comparative Criminology. To request a copy of this article, you may email co-author Jan Looman (CLICK HERE). Thank you, Dr. Looman.

*See comment by Becky, below, who found the exact cost in the current state budget.

January 8, 2012

More developments on the sex offender front

Study finds problems with real-world reliability of Static-99

Evaluators differ almost half of the time in their scoring of the most widely used risk assessment instrument for sex offenders, the Static-99, according to a report in the current issue of Criminal Justice and Behavior. Even a one-point difference on the instrument can have substantial practical implications, both for individual sex offenders and for public policy. In by far the largest and most ecologically valid study of interrater agreement in Static-99 scoring, the research examined paired risk ratings for about 700 offenders in Texas and New Jersey. The findings call into question the typical practice of reporting only a single raw score, without providing confidence intervals that would take into account measurement error. The study, the latest in a line of similar research by Marcus Boccaccini, Daniel Murrie and colleagues, can be requested HERE.

California reining in SVP cowboys

Psychiatrist Allen Frances has more news coverage of a memorable state-sponsored training at which Sexually Violent Predator (SVP) evaluators were cautioned to be more prudent in their diagnostic practices. Ronald Mihordin, MD, JD, acting clinical director of the Department of Mental Health program, warned evaluators against cavalierly diagnosing men who have molested teenagers with “hebephilia” and rapists with “paraphilias not otherwise specified-nonconsent,” unofficial diagnoses not found in the current edition of the American Psychiatric Association's Diagnostic and Statistical Manual of Mental Disorders. California evaluators have come under fire in the past for billing upwards of $1 million per year conducting SVP evaluations of paroling prisoners. The PowerPoints of the 3-day training are now available online, at the DMH's website.

The neuroscience of sex offending

In preventive detention trials of sex offenders, forensic evaluators often testify about whether an offender lacks volitional control over his conduct. But how much do we really know about this? In the current issue of Aggression and Violent Behavior, forensic psychologist John Matthew Fabian explores the neuroscience literature on sex offending as it applies to civil commitment proceedings. The article can be viewed online, or requested from the author HERE.

Challenge to sex offender registry

Although the sex offender niche is by far the most partisan and contentious in forensic psychology, one thing that just about all informed professionals agree about is that sex offender registration laws do more harm than good. By permanently stigmatizing individuals, they hamper rehabilitation and reintegration; as Elizabeth Berenguer Megale of the Barry University School of Law explores in an essay in the Journal of Law and Social Deviance (full-text available HERE), they lead to a form of “social death.” Now, the California Coalition on Sexual Offending (CCOSO) and the Association for the Treatment of Sexual Abusers (ATSA) have filed a joint amicus brief in a challenge to California's "Jessica's Law," which bars registered sex offenders from living within 2,000 feet of any school or park. The amicus contends that the restriction is punishment without any rational purpose, in that it does not enhance public safely or deter future criminality. The challenge was brought by Steven Lloyd Mosley. After a jury found Mosley guilty of misdemeanor assault, a non-registerable offense, the sentencing judge ordered him to register anyway, ruling that the assault was sexually motivated. The 4th District Court of Appeal granted Mosley’s appeal, and the California Department of Corrections has appealed to the state's supreme court. We'll have to wait and see whether the high court will tackle the issue of registration laws directly, or will sidestep with a narrow, technical ruling.

November 27, 2011

MnSOST-3: Promising new actuarial for sex offenders to debut

Note: See below postscript for a link to the MnSOST-3 instrument and manual, now available online.

Regular readers know that I've criticized our field's overreliance on imprecise and atheoretical screening instruments to predict whether or not an individual will behave violently in the future.

As Patrick Lussier and Garth Davies of Simon Fraser University point out in the current issue of Psychology, Public Policy, and Law, the actuarialist approach of searching for external variables that distinguish individuals "is somewhat at odds with the rationale of risk assessment, which is intended to assess the risk of an individual but also takes into account any changes in the level of risk over time for a specific individual."

In their new longitudinal study, Lussier and Davies identified heterogeneous trajectories in sexual and violent offending over time. They suggest that by turning "a blind eye" to criminological research on the developmental course of offending, the actuarialists have produced measures that are a misfit for many if not most individuals, overestimating risk in some cases and underestimating it in others.

While I agree philosophically with their critique, we have to be realistic.

Legislatures and courts love the so-called actuarials, which rate an individual's risk based on the presence of various preselected risk factors. They're quick and easy to administer. And they offer an illusion of scientific certitude that legitimizes current laws and criminal justice practices.

So, until a more theoretically informed, person-oriented approach gains traction, we should at minimum insist on more accurate actuarials, and better acknowledgment of their limitations. That was the goal, for instance, of the Multisample Age-Stratified Table of Sexual Recidivism Rates (MATS-1), a collaborative project by researchers in the United States, New Zealand, and Australia to more accurately incorporate advancing age into predictions of risk for sex offenders.

With that more modest goal in mind, I am cautiously optimistic about a newly developed actuarial tool for assessing recidivism among sex offenders, the MnSOST-3.

A better actuarial?

Before you recoil in shock based on the name alone, let me reassure you that it's a completely different tool from the old Minnesota Sex Offender Screening Tool (the MnSOST or MnSOST-R). Only three of the new instrument's items are the same, and even those are measured differently, so I don't even know why they kept the tainted name. As many of you know, the original MnSOST (pronounced MIN-sauced) oversampled high-risk offenders and so produced artificially inflated estimates of risk. Also, research on its development was never published in a peer-reviewed journal.

Based on an article by developers Grant Duwe and Pamela Freske accepted for publication in the journal Sexual Abuse, the new and improved MnSOST-3 appears to have several advantages over existing actuarial instruments for assessing sex offender recidivism.

The developers took advantage of advances in statistical modeling, using a predictive logistic regression model that enables a more nuanced measurement of the effects of continuous predictors such as age and number of prior offenses. Risk is adjusted based on whether an offender will be under any kind of supervision in the community, something other actuarials do not consider. Scoring is done on an Excel spreadsheet, which should reduce data entry errors.

A major strength of the MnSOST-3 is that it was developed on a contemporary sample that included 2,315 sex offenders released from Minnesota prisons between 2003 and 2006. Given the plummeting rates of sexual offending in the Western world over the past couple of decades, this is imperative in order not to overestimate risk.

The developers report that the MnSOST-3 is well calibrated with actual recidivism rates for all but the highest-risk offenders, for whom it overestimates risk. In other words, the predicted probabilities of recidivism match up pretty closely with the actual rates of reoffending except for the very highest-risk offenders. Overall, about four percent of the released offenders were reconvicted of a new sex crime within four years, a base rate that is consistent with other recent research findings.

Moose Lake sex offender facility, Minnesota

The authors frankly acknowledge the problem that this low base rate poses for accurate identification of recidivists. While offenders who scored in the top 10 percent on the MnSOST-3 were more likely than lower-scoring men to reoffend (their rate of reconviction was 22 percent), if you predicted that any given individual in this top bracket would reoffend, you would be wrong four times out of five.

The optimism-corrected accuracy of the MnSOST-3 for the contemporary sample, as measured by the Area Under the Curve (AUC) statistic, was .796. This means that there is about an 80% chance that a randomly selected recidivist will have a higher score on the instrument than a randomly selected non-recidivist -- although this applies only to the sample from which the instrument was developed and is not generalizable to other samples.

Although we must wait to see whether this moderate accuracy will generalize to sex offender populations outside of Minnesota, the MnSOST-3 may be about as good as it gets. After a decades-long search for the Holy Grail of risk prediction, consensus is building that the obstacles are insurmountable. Low base rates of recidivism, along with fluid and unpredictable environmental contexts, place a firm ceiling on predictive accuracy.

Which gets us back to the point made by Lussier and Davies: Consistent with a large body of criminological theory, we need to recognize the criminal career as a process with a beginning, a middle and an end. In other words, it's time to start looking at the individual offender and understanding his specific offense trajectory, rather than just continuing to amass collections of external variables to measure him against.

Oh, in case you were wondering how well the old MnSOST-R did at predicting which men in the contemporary Minnesota sample would reoffend, it had an AUC of .55. That's about as good as a coin flip.

So, if nothing else, the MnSOST-3 should seal the death warrant of its worn-out ancestors. Given their inaccurate and bloated estimates of risk, that will be a very good thing.

The articles are:

Lussier, Patrick and Davies, Garth (2011) A Person-Oriented Perspective on Sexual Offenders, Offending Trajectories, and Risk of Recidivism: A New Challenge for Policymakers, Risk Assessors, and Actuarial Prediction? Psychology, Public Policy, and Law 17 (4), 530–561. (To request a copy from the author, click HERE.)

Duwe, Grant and Freske, Pamela (In Press), Using Logistic Regression Modeling to Predict Sex Offense Recidivism: The Minnesota Sex Offender Screening Tool-3 (Mnsost-3), Sexual Abuse. (To request a copy from the author, click HERE.)

POSTSCRIPT: The MnSOST-3 is now being used by the Minnesota Department of Corrections; thus, the instrument and the scoring manual are available online -- HERE.

Related blog posts:

MnSOST-R actuarial instrument critiqued (July 3, 2008)
SVP industry sneak peek: Problems in Actuaryland (Oct. 4, 2009)
Study: Actuarials fail to predict sexually violent recidivism (March 5, 2010)
Delusional campaign for a world without risk (April 3, 2010)
New study: Do popular actuarials work? Newer instruments outperform Static-99 and RRASOR (April 20, 2010)
Groundbreaking study of sex offender life courses: Challenge to actuarials: 4 distinct trajectories ID'd (June 4, 2010)
Metaanalysis debunks psychopathy-violence link (Sept. 3, 2010)
Age tables improve sex offender risk estimates (Dec. 1, 2010)

November 20, 2011

Psychology rife with inaccurate research findings

The case of a Dutch psychologist who fabricated experiments out of whole cloth for at least a decade is shining a spotlight on systemic flaws in the reporting of psychological research.

Diederik Stapel, a well-known and widely published psychologist in the Netherlands, routinely falsified data and made up entire experiments, according to an investigative committee.

But according to Benedict Carey of the New York Times, the scandal is just one in a string of embarrassments in "a field that critics and statisticians say badly needs to overhaul how it treats research results":

In recent years, psychologists have reported a raft of findings on race biases, brain imaging and even extrasensory perception that have not stood up to scrutiny….

Dr. Stapel was able to operate for so long, the committee said, in large measure because he was “lord of the data,” the only person who saw the experimental evidence that had been gathered (or fabricated). This is a widespread problem in psychology, said Jelte M. Wicherts, a psychologist at the University of Amsterdam. In a recent survey, two-thirds of Dutch research psychologists said they did not make their raw data available for other researchers to see. "This is in violation of ethical rules established in the field," Dr. Wicherts said.

In a survey of more than 2,000 American psychologists scheduled to be published this year, Leslie John of Harvard Business School and two colleagues found that 70 percent had acknowledged, anonymously, to cutting some corners in reporting data. About a third said they had reported an unexpected finding as predicted from the start, and about 1 percent admitted to falsifying data.

Also common is a self-serving statistical sloppiness. In an analysis published this year, Dr. Wicherts and Marjan Bakker, also at the University of Amsterdam, searched a random sample of 281 psychology papers for statistical errors. They found that about half of the papers in high-end journals contained some statistical error, and that about 15 percent of all papers had at least one error that changed a reported finding -- almost always in opposition to the authors' hypothesis….

Forensic implications

While inaccurate and even fabricated findings make the field of psychology look silly, they take on potentially far more serious ramifications in forensic contexts, where the stakes can include six-figure payouts or extreme deprivations of liberty.

For example, claims based on fMRI brain-scan studies are increasingly being allowed into court in both criminal and civil contexts. Yet, a 2009 analysis found that about half of such studies published in prominent scientific journals were so "seriously defective" that they amounted to voodoo science that "should not be believed."

Similarly, researcher Jay Singh and colleagues have found that meta-analyses purporting to show the efficacy of instruments used to predict who will be violent in the future are plagued with problems, including failure to adequately describe study search procedures, failure to check for overlapping samples or publication bias, failure to investigate the confound of sample heterogeneity, and use of a problematic statistical technique, the Area Under the Curve (AUC), to measure predictive accuracy.

Particularly troubling to me is a brand-new study finding that researchers' willingness to share their data is directly correlated with the strength of the evidence and the quality of reporting of statistical results. (The analysis is available online from the journal PloS ONE.)

I have heard about several researchers in the field of sex offender risk assessment who stubbornly resist efforts by other researchers to obtain their data for reanalysis. As noted by Dr. Wicherts, the University of Amsterdam psychologist, this is a violation of ethics rules. Most importantly, it makes it impossible for us to be confident about the reliability and validity of these researchers' claims. Despite this, potentially unreliable instruments -- some of them not even published -- are routinely introduced in court to establish future dangerousness.

Critics say the widespread problems in the field argue strongly for mandatory reforms, including the establishment of policies requiring that researchers archive their data to make it available for inspection and analysis by others. This reform is important for the credibility of psychology in general, but absolutely essential in forensic psychology.

Related blog posts:

Beware "voodoo" brain science (March 10, 2009)
Violence risk meta-meta: Instrument choice does matter (June 19, 2011)

Hat tips: Ken Pope and Jane

New article of related interest:

False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant

Psychological Science (November 2011)
Joseph Simmons, Leif Nelson, and Uri Simonsohn (click on any of the authors' names to request a copy)

From the abstract: This article show[s] that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis.

November 6, 2011

Call for papers on violence risk assessment

The field of violence risk assessment has expanded rapidly over the past several decades. But despite a plethora of new risk assessment tools, confusion abounds as to how to understand their accuracy and utility. And controversy is growing over how well these tools actually predict violence in the individual case.

To address these gaps, forensic scholars John Petrila and Jay Singh of the University of South Florida have teamed up to edit a special issue of the respected journal, Behavioral Sciences and the Law on the topic of "measuring and interpreting the predictive validity of violence risk assessment."

The goal of the special issue is to provide a comprehensive and accessible resource for researchers, clinicians, and policymakers interested in the measurement of predictive validity or the use of such findings in clinical or legal practice.

The editors invite empirical and conceptual papers on the measurement of predictive validity as it relates to violence risk assessment. In addition, papers focusing on the implications of the measurement of predictive validity for public protection and individual liberty are also welcome, as are legal perspectives on these issues.

Papers should be no longer than 35 pages, including tables, figures and references. The deadline for submissions is July 1, 2012. Authors should send two electronic copies of any submission, one blinded for peer review, to John Petrila, JD or Jay P. Singh, PhD.

October 5, 2011

Combating the pull to overpredict violence

Like the moon's effect on tides, the pull to overpredict violence exerts a powerful influence, even on seasoned forensic evaluators who know its strength.

When directly informed that an event has a low base rate of occurrence -- for example, that a homicide offender has only a 1 in 100 likelihood of being arrested for another homicide -- both laypeople and professionals will markedly overpredict violence.

In an article in the Journal of the American Academy of Psychiatry and Law, eminent forensic psychologist Stanley Brodsky and postdoctoral fellow Sarah L. Miller analyze why this is so.

For one thing, the risk of underpredicting violence has more potential to negatively impact the evaluator. Bad publicity, public outrage, even civil litigation. Not to mention the harm committed by a high-risk individual who reoffends.

Far safer to "err on the side of public safety," goes clinical lore. A claim of dangerousness is well nigh impossible to disprove. And especially in the context of civil commitment of sex offenders, the issue is not framed as punishment but, rather, as "an acceptable restriction of individual rights in the interest of public safety and rehabilitation." It's not as if these guys are sympathetic characters, with a constituency of supporters looking out for their rights.

Certain psychological mechanisms also contribute to bias in the direction of overpredicting risk. These include confirmation bias, or seeking information to support a preconceived conclusion, and illusory correlation, in which the evaluator assumes two things are related just because they co-occurred.

The purpose of Brodsky and Miller's well-argued review is to make evaluators more aware of the natural overprediction tendency, and to provide a checklist that evaluators can use to assess and correct their potential biases.

It's a great idea, although I am a bit skeptical that such a simple approach will make much of an impact in the adversarial arena.

The full article is available for free download HERE.

September 23, 2011

Forensic trainings on the Eastern Seaboard

Oct. 2: Fun-filled training in New York

Stephen Morse

The New York State Psychological Association's Forensic Division is holding a one-day conference that some are billing as the best single-day conference this year. Keynote speaker Stephen J. Morse JD, PhD will open the day with a talk on “Folk Psychology: The Key to Legally Relevant Forensic Communication.” The day will end with a 2-hour moot court and then a wine social. Sandwiched in between are presentations by:

William Barr, PhD on “Evaluating Competency: A Neuropsychological Perspective”
Michael Perlin, JD on “ There Must be Some Way Out of Here: Why The Convention on the Rights of Persons with Disabilities is Potentially the Best Weapon in the Fight Against Sanism in Forensic Facilities”
Joseph Plaud, PhD on “Psychological Assessment of Sexual Offenders: Where We’ve Been and Where We’re Going”
David Martindale, PhD on “A Reviewer’s Take on Custody Evaluations”


Michael Perlin

The conference is being held at the Faculty House at Columbia University in New York City, which I am told is a great venue. The full conference program is HERE. Registration is HERE. Get it while it's hot.

Oct. 14: Risk management in the community

Well-known forensic psychologist Kirk Heilbrun of Drexel University is the featured presenter at this Forensic Mental Health Symposium sponsored by the Institute of Law, Psychiatry, and Public Policy at the University of Virginia. The event will be held at the Crowne Plaza Richmond West in Richmond, Virginia. More information is available at the ILPPP website. To register, click HERE.

Nov. 4: Police custody and the interrogation of youth

Kirk Heilbrun

This Advanced Seminar in Juvenile Forensic Practice, also sponsored by the ILPPP, features several interesting speakers, including:

Lawrence Fitch, Esq on The Rights of Juveniles in Delinquency Cases: Understanding the Principles of Miranda Waiver and the Admissibility of Confessions in Juvenile Court
Dick Reppucci, PhD on Research on the Police Interrogation of Juveniles
Gregg McCrary (FBI, Retired) on Controversial Juvenile Cases: Evidence, Testimony and Outcomes

The event will be held at the University of Virginia in Charlottesville. To register, click HERE.

Next semester, the ILPPP is planning an advanced workshop on evaluating sanity with Ira Packer, and a workshop on Motivational Interviewing with David Prescott. Check back with their website for updates on those, and more.