IN THE NEWS: Search results for Jay Singh

Showing posts sorted by relevance for query Jay Singh. Sort by date Show all posts

February 9, 2014

Risk researchers launching premium literature service

The Alliance for International Risk Research (AIRR) is launching an excellent new risk assessment resource for mental health, correctional, and legal professionals. The AIRR Executive Bulletin is being called "an exceptional resource that lawyers on both sides, judges, examiners and the rest of us practitioners in these areas of forensic mental health, treatment and law should subscribe to in order to begin to implement a uniform body of current literature and ‘Best Practices’ that continually updates going forward to facilitate development of a legal and constitutional body of law."

The subscription-based service is designed for busy professionals who want to stay up to date but simply do not have the time to locate and read the voluminous literature published each month. It aggregates research on risk assessment for violence, sex offending and general recidivism among adults and juveniles.

AIRR researchers Jay Singh and Kevin Douglas

An expert team led by top risk assessment researchers Jay Singh and Kevin Douglas systematically searches more than 80 journals and identifies every new risk assessment article published each month. The average is around 20 articles. Doctors Singh and Douglas then purchase and read every article and write a one-page, easy-to-digest summary without statistical jargon.

In addition to the monthly summary of literature, subscribers also get four online risk assessment training seminars per year from top clinical researchers, and an exclusive monthly interview with an industry leader. It's a convenient way to get continuing education credits, because the trainings are eligible for American Psychological Association and Canadian Psychological Association credits.

A sample issue is available HERE.

You can sign up for either an individual or a group subscription HERE. For questions, contact lead reasearcher Jay Singh (HERE).

June 13, 2013

International violence risk researchers launch free news service

I don't know about you, but I find it incredibly hard to keep up with the burgeoning research in risk assessment. In this era of international fear and carceral control, disciplines from psychology to criminology to nursing to juvenile justice are cranking out more articles each month, and the deluge can feel overwhelming.

Fortunately, two prominent researchers are offering to help us stay organized and up to speed -- for free. The newly created Alliance for International Risk Research (AIRR) will send out a monthly email containing references to all new articles related to forensic risk assessment from over 80 scholarly journals. And all you have to do is sign up.

Jay Singh and Kevin Douglas, AIRR Editors-in-Chief

The AIRR is brought to you by Jay Singh and Kevin Douglas. Dr. Singh, a newly appointed professor and senior researcher for the Department of Justice in Switzerland, is one of the best and brightest around (I've featured his important research on violence risk assessment more than once on this blog); Dr. Douglas is an award-winning psychology professor at Simon Fraser University and co-author of the widely used HCR-20 violence risk assessment tool, among others.

Their goal is to keep clinicians, policymakers, and researchers up to date in this rapidly evolving field, thus promoting evidence-based practices in the mental health and criminal justice systems. For articles published in languages other than English, the AIRR even boasts an "international coordinator" who will help disseminate the research to a global audience.

Signing up is easy: Just go to THIS LINK and provide your email contract information. The AIRR promises not to bother you with solicitations, survey participation requests or conference announcements -- "simply the latest risk-related research at your fingertips."

Don't delay! The first AIRR bulletin will be arriving in inboxes on Sept. 1.

July 6, 2014

Innovative international risk assessment service is expanding

Try your hand at answering these questions:

When evaluating Aboriginal offenders, how valid are standard risk assessment protocols?

Among Canadian men, how well does the Danger Assessment (DA) predict domestic violence?

For sex offenders in Vermont, what instrument is more accurate than the widely used Static-99 for predicting recidivism?

In screening U.S. soldiers coming back from Afghanistan, is there a valid tool that would help allocate limited therapeutic resources in order to decrease violence risk?

Finally, what the heck are the Y-ARAT, the CuRV, the START, and the VIO-SCAN, and what (if anything) are they good for?

With the frenetic pace of risk assessment research and practice developments, you couldn't be faulted for not knowing the correct answers to all of the above questions. Hardly anyone does.

That’s where the Executive Bulletin comes in.

Back in February, I told you about the launch of this service for clinicians, attorneys and researchers who want to stay abreast of developments in the field of risk assessment. The publishers scour more than 80 professional journals and create a one-page summary for each article relevant to violence and sex offending risk assessment among adults and juveniles. Using an appealing, easy-to-read format, each summary highlights the study's clinical implications and relevant legal questions, while minimizing statistical jargon.

In the months since my announcement, the Bulletin has been gaining traction around the world. It now reaches more than 11,000 practitioners, researchers, and policymakers in the United States, Australia, China, Hong Kong, Spain, Germany, Canada, the United Kingdom, Argentina, Israel, the Netherlands, Mexico, Lithuania, Norway and Denmark. Among its largest subscribers are the California Department of State Hospitals -- which requires that its forensic evaluators read each monthly issue in order to stay abreast of peer-reviewed research into evidence-based practice -- and the public-policy oriented Council of State Governments in the United States.

The newly rebranded Global Institute of Forensic Research (GIFR), with the ever-energetic forensic psychologist Jay Singh at its helm, is currently rolling out a new features and services, including a new website, a podcast version of the Bulletin for commuters, and expert risk assessment trainings (free to subscribers) that are eligible for continuing education credits from the American Psychological Association and the Canadian Psychological Association.

The service is subscription-based. At $35 per month (and $350 for group subscriptions) it isn’t cheap, but Dr. Singh points out that the alternatives are also costly. It’s both costly and time consuming to stay abreast of important risk-related articles from more than 80 journals, most of them fee-based. Thus, without a synthesizing service such as the Bulletin, practitioners risk falling behind and inadvertently violating relevant standards of practice.

Among my own main concerns if I am going to allow someone else to find and synthesize research for my consumption is the degree of fidelity and expertise that the reviewer brings to bear. Here, the field is fortunate to have someone upon whom we can confidently rely. What I find most valuable about the Bulletin is the level of critical analysis that the expert reviewers bring to bear on each of the 15 or 20 articles they summarize each month. (Indeed, my confidence is why I accepted an invitation a while back to serve on the Institute’s advisory board.)

Singh, an epidemiology professor at Molde University in Norway, has published more than 40 cutting-edge articles on violence prediction (a few of which I have featured in prior blog posts). Formerly a fellow of the Florida Mental Health Institute and a Senior Researcher in Forensic Psychiatry for the Swiss Department of Corrections in Zurich, he has also trained and lectured widely on mental illness and violence, including at Harvard, Yale, Columbia, Cornell, Brown, Dartmouth, and the University of Pennsylvania.

To date, his Institute has conducted exclusive interviews on tips and tricks in forensic assessment with leading practitioners and scholars including Jodi Viljoen, Nicholas Scurich, Annelies Vredeveldt and -- most recently -- Jennifer Lanterman of the University of Nevada at Reno. Next month’s featured expert is Seena Fazel of Oxford University. You can browse the website and find a sample issue HERE.

If you decide to sign up (or, better yet, get your institution to sign up), Singh is offering my blog readers and subscribers a special 10 percent discount. Just click on THIS LINK, and enter the discount code INTHENEWS.

September 14, 2011

Violence risk in schizophrenics: Are forensic tools reliable predictors?

The high-profile cases of Jared Lee Loughner and Anders Behring Breivik have contributed to high public demand for accurate prediction of violence potential among the mentally ill. While the number of risk assessment tools designed for this purpose has exploded in the past two decades, no systematic review has been conducted to investigate how accurate these tools are for predicting risk in individuals with schizophrenia.

But never fear: Jay Singh of the University of Oxford and colleagues (whose recent meta-review questioned overbroad claims about the accuracy of actuarials in risk assessment) have stepped into the breach, this time examining whether existing tools have proven efficacy for this task.

Reporting in this month's special issue of Schizophrenia Bulletin on violence and schizophrenia, the authors state that despite the existence of at least 158 structured tools for predicting outpatient violence risk, only two studies have measured instruments' predictive validity in discharged patients diagnosed with schizophrenia.

Instead of reporting on instruments' accuracy for specific patient groups, most studies report predictive validity estimates for heterogeneous groups of psychiatric patients. This forces clinicians and the public to assume that these group-level data apply to any individual diagnostic group.This assumption turns out to be a problem, due in part to the large differences in base rates of violence in psychiatric patients. We know, for example, that individuals with substance abuse disorders are more prone to violence, in general, than those diagnosed with major depression.

Examining the psychometric and predictive features of 10 widely used tools for assessing risk in mentally disordered offenders and civil psychiatric patients, the authors found "little direct evidence to support the use of these risk assessment tools in schizophrenia, specifically."

Overall, schizophrenics have low base rates of violence, with an estimated prevalence of between 10 and 15 percent. As I've discussed here in the context of sex offenders, the rarer a behavior is, the harder it is to successfully predict, leading to erroneous predictions of high risk in people who are not truly dangerous. The authors quote another research finding that in order to prevent one stranger homicide by a schizophrenic, governments would need to detain a whopping 35,000 patients.

That sounds to me like a black swan problem.

As in their previous meta-meta-analysis, the authors critique the almost exclusive use of the area under the curve (AUC) statistic to validate risk assessment instruments. Proponents of the AUC like it because it measures predictive utility independent of the base rate of the behavior in question. But this is as much a weakness as a strength, leading to a false sense of confidence in our ability to accurately predict the risk of individuals in heterogeneous groups of patients:

"High" AUC values for heterogeneous groups of psychiatric patients may have led researchers, clinicians, and policymakers to believe that instruments perform well for all diagnostic groups. However, it is problematic to suggest that structured instruments would be able to identify high-risk individuals with the same accuracy in groups with higher and lower base rates of violence.

In another interesting finding, Singh and colleagues found that the item content of violence risk tools varies markedly, with many tools including unique factors not contained in other instruments. This is a problem, unless these items are truly correlated with risk.

The authors call for updated reviews of the risk and protective factors underlying violence in different psychiatric groups -- including, for example, executive dysfunction in schizophrenics -- before additional risk assessment tools are constructed.

The review is available by contacting Dr. Singh (click HERE), who shortly will be coming to America to accept a post with the Mental Health Law and Policy Department of the University of South Florida.

Related blog post: Violence risk meta-meta: Instrument choice does matter (June 19, 2011)

June 19, 2011

Violence risk meta-meta: Instrument choice does matter

Despite popularity, psychopathy test and actuarials not superior to other prediction methods

The past couple of decades have seen an explosion of interest in forensic assessment of risk for future violent and sexual recidivism. Accordingly, evaluators can now choose from an array of more than 120 different risk assessment tools. But should this choice be based on individual preference, or are some instruments clearly superior to others?

Several systematic reviews and metaanalyses have addressed this question, but their conclusions often conflict. In the first systematic review of these reviews (called a “meta-review”), Jay Singh and Seena Fazel of Oxford University found that methodological shortcomings may contribute to the confusion. Problems they identified in the 40 metaanalyses and reviews they studied included authors' failure to adequately describe their study search procedures, failure to check for overlapping samples or publication bias, and failure to investigate the confound of sample heterogeneity.

The Oxford scholars, along with Martin Grann of Sweden's Centre for Violence Prevention, set out to rectify this problem via a more methodologically rigorous meta-review, using optimal data analyses and reporting procedures. For this purpose, they used the Preferred Reporting Items for Systematic Reviews and Metaanalyses, a 27-item checklist designed to enable a transparent and consistent reporting of results.

For their meta-meta (a metaanalysis of the metaanalyses), they collected data from 68 studies involving about 26,000 participants in 13 countries, focusing on the accuracy of the nine most commonly used forensic risk assessment instruments:

Psychopathy Checklist (PCL-R)
Static-99
Historical, Clinical, Risk Management-20 (HCR-20)
Violence Risk Appraisal Guide (VRAG)
Sexual Violence Risk-20 (SVR-20)
Level of Service Inventory (LSI-R)
Sex Offender Risk Appraisal Guide (SORAG)
Spousal Assault Risk Assessment (SARA)
Structured Assessment of Violence Risk in Youth (SAVRY)

Big differences in predictive validity

As it turns out, these widely used instruments vary substantially in predictive accuracy. Performing the best was the SAVRY, a risk assessment instrument designed for use with adolescents. At the bottom were the Level of Service Inventory and the Psychopathy Checklist. This is not too surprising, as the LSI-R is used with a wide variety of general offenders, and the PCL-R was not designed for risk prediction in the first place.

The present metaanalysis would therefore argue against the view of some experts that the PCL- R is unparalleled in its ability to predict future offending.

Statistical method matters: DOR outperforms AUC

The researchers compared several different methods of measuring predictive accuracy. They found that a popular statistic called the Area Under the Curve (AUC) was the weakest. Use of the AUC statistic may help to explain why some metaanalyses were unable to find significant differences among instruments, the authors theorize.

Better methods for comparing instruments’ predictive accuracy include calculating positive and negative predictive values and also using something called the Diagnostic Odds Ratio, or DOR. This is the ratio of the odds of a positive test result in an offender (true positive) relative to the odds of a positive result in a non-offender (false positive). The authors’ summary performance scores pooled results from all four statistical methods.

Actuarials not superior; race also matters

The poor performance of the Psychopathy Checklist (PCL-R) was not the only finding that may surprise some forensic evaluators. The researchers also found no evidence that actuarial tools – such as the widely touted Static-99 – outperform structured clinical judgment methods like the HCR-20 or the SVR-20.

They also found that an offender's race is critical to predictive accuracy. Risk assessment instruments perform best on white offenders, most likely because white offenders predominate in the underlying studies. This is consistent with other research, including a study by Dernevick and colleagues finding that risk assessment instruments are poor at predicting misconduct in terrorists.

Caution is therefore warranted when using any risk assessment tool to predict offending in samples dissimilar to their validation samples, the authors stress.

This systematic review appears to be the most methodologically rigorous such study to date, in a rapidly evolving field. I recommend obtaining both articles (see below for author contact information) and studying them carefully. The stakes are high, and it behooves us to use the instruments that are the most accurate for the specific purpose at hand.

The studies are:

Singh, J.P., et al., A comparative study of risk assessment tools: A systematic review and metaregression analysis of 68 studies involving 25,980 participants, Clinical Psychology Review (2011), doi:10.1016/j.cpr.2010.11.009. [Contact the author.]

Singh, J.P. and Fazel, S., Forensic Risk Assessment: A Metareview, Criminal Justice and Behavior (2010) [Contact the author.]

November 6, 2011

Call for papers on violence risk assessment

The field of violence risk assessment has expanded rapidly over the past several decades. But despite a plethora of new risk assessment tools, confusion abounds as to how to understand their accuracy and utility. And controversy is growing over how well these tools actually predict violence in the individual case.

To address these gaps, forensic scholars John Petrila and Jay Singh of the University of South Florida have teamed up to edit a special issue of the respected journal, Behavioral Sciences and the Law on the topic of "measuring and interpreting the predictive validity of violence risk assessment."

The goal of the special issue is to provide a comprehensive and accessible resource for researchers, clinicians, and policymakers interested in the measurement of predictive validity or the use of such findings in clinical or legal practice.

The editors invite empirical and conceptual papers on the measurement of predictive validity as it relates to violence risk assessment. In addition, papers focusing on the implications of the measurement of predictive validity for public protection and individual liberty are also welcome, as are legal perspectives on these issues.

Papers should be no longer than 35 pages, including tables, figures and references. The deadline for submissions is July 1, 2012. Authors should send two electronic copies of any submission, one blinded for peer review, to John Petrila, JD or Jay P. Singh, PhD.

September 4, 2013

'Authorship bias' plays role in research on risk assessment tools, study finds

Reported predictive validity higher in studies by an instrument's designers than by independent researchers

The use of actuarial risk assessment instruments to predict violence is becoming more and more central to forensic psychology practice. And clinicians and courts rely on published data to establish that the tools live up to their claims of accurately separating high-risk from low-risk offenders.

But as it turns out, the predictive validity of risk assessment instruments such as the Static-99 and the VRAG depends in part on the researcher's connection to the instrument in question.


Publication bias in pharmaceutical research has been well documented

Published studies authored by tool designers reported predictive validity findings around two times higher than investigations by independent researchers, according to a systematic meta-analysis that included 30,165 participants in 104 samples from 83 independent studies.

Conflicts of interest shrouded

Compounding the problem, in not a single case did instrument designers openly report this potential conflict of interest, even when a journal's policies mandated such disclosure.

As the study authors point out, an instrument’s designers have a vested interest in their procedure working well. Financial profits from manuals, coding sheets and training sessions depend in part on the perceived accuracy of a risk assessment tool. Indirectly, developers of successful instruments can be hired as expert witnesses, attract research funding, and achieve professional recognition and career advancement.

These potential rewards may make tool designers more reluctant to publish studies in which their instrument performs poorly. This "file drawer problem," well established in other scientific fields, has led to a call for researchers to publicly register intended studies in advance, before their outcomes are known.

The researchers found no evidence that the authorship effect was due to higher methodological rigor in studies carried out by instrument designers, such as better inter-rater reliability or more standardized training of instrument raters.

"The credibility of future research findings may be questioned in the absence of measures to tackle these issues," the authors warn. "To promote transparency in future research, tool authors and translators should routinely report their potential conflict of interest when publishing research investigating the predictive validity of their tool."

The meta-analysis examined all published and unpublished research on the nine most commonly used risk assessment tools over a 45-year period:

Historical, Clinical, Risk Management-20 (HCR-20)
Level of Service Inventory-Revised (LSI-R)
Psychopathy Checklist-Revised (PCL-R)
Spousal Assault Risk Assessment (SARA)
Structured Assessment of Violence Risk in Youth (SAVRY)
Sex Offender Risk Appraisal Guide (SORAG)
Static-99
Sexual Violence Risk-20 (SVR-20)
Violence Risk Appraisal Guide (VRAG)

Although the researchers were not able to break down so-called "authorship bias" by instrument, the effect appeared more pronounced with actuarial instruments than with instruments that used structured professional judgment, such as the HCR-20. The majority of the samples in the study involved actuarial instruments. The three most common instruments studied were the Static-99 and VRAG, both actuarials, and the PCL-R, a structured professional judgment measure of psychopathy that has been criticized criticized for its vulnerability to partisan allegiance and other subjective examiner effects.

This is the latest important contribution by the hard-working team of Jay Singh of Molde University College in Norway and the Department of Justice in Switzerland, (the late) Martin Grann of the Centre for Violence Prevention at the Karolinska Institute, Stockholm, Sweden and Seena Fazel of Oxford University.

A goal was to settle once and for all a dispute over whether the authorship bias effect is real. The effect was first reported in 2008 by the team of Blair, Marcus and Boccaccini, in regard to the Static-99, VRAG and SORAG instruments. Two years later, the co-authors of two of those instruments, the VRAG and SORAG, fired back a rebuttal, disputing the allegiance effect finding. However, Singh and colleagues say the statistic they used, the receiver operating characteristic curve (AUC), may not have been up to the task, and they "provided no statistical tests to support their conclusions."

Prominent researcher Martin Grann dead at 44

Sadly, this will be the last contribution to the violence risk field by team member Martin Grann, who has just passed away at the young age of 44. His death is a tragedy for the field. Writing in the legal publication Das Juridik, editor Stefan Wahlberg noted Grann's "brilliant intellect" and "genuine humanism and curiosity":

Martin Grann came in the last decade to be one of the most influential voices in both academic circles and in the public debate on matters of forensic psychiatry, risk and hazard assessments of criminals and ... treatment within the prison system. His very broad knowledge in these areas ranged from the law on one hand to clinical therapies at the individual level on the other -- and everything in between. This week, he would also debut as a novelist with the book "The Nightingale."

The article, Authorship Bias in Violence Risk Assessment? A Systematic Review and Meta-Analysis, is freely available online via PloS ONE (HERE).

Related blog reports:

Violence risk instruments overpredicting danger (Aug. 2, 2012)
Violence risk in schizophrenics: Are forensic tools reliable predictors? (Sept. 14, 2011)
Violence risk meta-meta: Instrument choice does matter (June 19, 2011)
Psychology rife with inaccurate research findings (Nov. 20, 2011)
International violence risk researchers launch free news service (June 13, 2013)

November 20, 2011

Psychology rife with inaccurate research findings

The case of a Dutch psychologist who fabricated experiments out of whole cloth for at least a decade is shining a spotlight on systemic flaws in the reporting of psychological research.

Diederik Stapel, a well-known and widely published psychologist in the Netherlands, routinely falsified data and made up entire experiments, according to an investigative committee.

But according to Benedict Carey of the New York Times, the scandal is just one in a string of embarrassments in "a field that critics and statisticians say badly needs to overhaul how it treats research results":

In recent years, psychologists have reported a raft of findings on race biases, brain imaging and even extrasensory perception that have not stood up to scrutiny….

Dr. Stapel was able to operate for so long, the committee said, in large measure because he was “lord of the data,” the only person who saw the experimental evidence that had been gathered (or fabricated). This is a widespread problem in psychology, said Jelte M. Wicherts, a psychologist at the University of Amsterdam. In a recent survey, two-thirds of Dutch research psychologists said they did not make their raw data available for other researchers to see. "This is in violation of ethical rules established in the field," Dr. Wicherts said.

In a survey of more than 2,000 American psychologists scheduled to be published this year, Leslie John of Harvard Business School and two colleagues found that 70 percent had acknowledged, anonymously, to cutting some corners in reporting data. About a third said they had reported an unexpected finding as predicted from the start, and about 1 percent admitted to falsifying data.

Also common is a self-serving statistical sloppiness. In an analysis published this year, Dr. Wicherts and Marjan Bakker, also at the University of Amsterdam, searched a random sample of 281 psychology papers for statistical errors. They found that about half of the papers in high-end journals contained some statistical error, and that about 15 percent of all papers had at least one error that changed a reported finding -- almost always in opposition to the authors' hypothesis….

Forensic implications

While inaccurate and even fabricated findings make the field of psychology look silly, they take on potentially far more serious ramifications in forensic contexts, where the stakes can include six-figure payouts or extreme deprivations of liberty.

For example, claims based on fMRI brain-scan studies are increasingly being allowed into court in both criminal and civil contexts. Yet, a 2009 analysis found that about half of such studies published in prominent scientific journals were so "seriously defective" that they amounted to voodoo science that "should not be believed."

Similarly, researcher Jay Singh and colleagues have found that meta-analyses purporting to show the efficacy of instruments used to predict who will be violent in the future are plagued with problems, including failure to adequately describe study search procedures, failure to check for overlapping samples or publication bias, failure to investigate the confound of sample heterogeneity, and use of a problematic statistical technique, the Area Under the Curve (AUC), to measure predictive accuracy.

Particularly troubling to me is a brand-new study finding that researchers' willingness to share their data is directly correlated with the strength of the evidence and the quality of reporting of statistical results. (The analysis is available online from the journal PloS ONE.)

I have heard about several researchers in the field of sex offender risk assessment who stubbornly resist efforts by other researchers to obtain their data for reanalysis. As noted by Dr. Wicherts, the University of Amsterdam psychologist, this is a violation of ethics rules. Most importantly, it makes it impossible for us to be confident about the reliability and validity of these researchers' claims. Despite this, potentially unreliable instruments -- some of them not even published -- are routinely introduced in court to establish future dangerousness.

Critics say the widespread problems in the field argue strongly for mandatory reforms, including the establishment of policies requiring that researchers archive their data to make it available for inspection and analysis by others. This reform is important for the credibility of psychology in general, but absolutely essential in forensic psychology.

Related blog posts:

Beware "voodoo" brain science (March 10, 2009)
Violence risk meta-meta: Instrument choice does matter (June 19, 2011)

Hat tips: Ken Pope and Jane

New article of related interest:

False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant

Psychological Science (November 2011)
Joseph Simmons, Leif Nelson, and Uri Simonsohn (click on any of the authors' names to request a copy)

From the abstract: This article show[s] that despite empirical psychologists' nominal endorsement of a low rate of false-positive findings (≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis.

October 18, 2012

Static-99R risk estimates wildly unstable, developers admit

The developers of the widely used Static-99R risk assessment tool for sex offenders have conceded that the instrument is not accurate in providing numerical estimates of risk for sexual recidivism for any specific offender.

The startling admission was published in the current issue of Criminal Justice and Behavior.

Examining the data from the 23 separate groups (totaling 8,106 offenders) that cumulatively make up the instrument’s aggregate norms, the researchers found alarmingly large variability in risk estimates depending on the underlying sample. The problem was especially acute for offenders with higher risk scores. A few examples:

At a low Static-99R score of "2," an offender’s predicted sexual recidivism rate after 10 years ranged from a low of 3 percent to a high of 20 percent, depending on the sample.

A score of "5" led to a recidivism estimate after five years of 10 percent in a large, representative sample of Swedish sex offenders, but a 250 percent higher risk, of 25 percent, in one U.S. sample. The absolute differences for more extreme scores were even larger.

Conversely, the Static-99R score that would predict a 15 percent likelihood of recidivism after five years ranged from a low-risk score of "2" to a high-risk score of "8," an enormous difference (greater than two standard deviations).

The study’s authors -- Karl Hanson, Leslie Helmus, David Thornton, Andrew Harris and Kelly Babchishin -- concede that such large variability in risk estimates "could lead to meaningfully different conclusions concerning an offender’s likelihood of recidivism."

Overall risk lower than previously found

Despite the wide variations in rates of offending, the absolute recidivism rate for the typical sex offender in the combined samples was low overall. The rate of recidivism among typical sex offenders after five years was only 7 percent or less (with a range of 4 to 12 percent), lower than had been reported in a previous meta-analysis. The 10-year risk ranged from 6 to 22 percent for the typical offender.

The research team speculates that the risk inflation in earlier analyses may have been an artifact of characteristics of the underlying samples, with data from higher-risk offenders more likely to be preserved and available for study. We know that a sister instrument, the MnSOST-R, produced inflated estimates of risk due to oversampling of high-risk offenders.

Will risk inflation continue?

The Static-99R has a very modest ability to discriminate recidivists from non-recidivists. Its so-called "Area Under the Curve" statistic of around .70 means that, if you were to randomly select one known recidivist and one non-recidivist from a group of offenders, there is about a 70 percent probability that someone who will reoffend will have a higher score than someone who won’t.

Such information about a test’s relative accuracy may be helpful when one is choosing which method to employ in doing a risk assessment. But there are a number of problems with relying on it when reporting one's assessment of a specific individual.

First of all, even that level of reliability may be illusory. A study that is currently in progress is finding poor inter-rater agreement on scores in routine practice, especially at the higher risk levels.

Second, with base rates of recidivism hovering around 6 to 7 percent, even under optimal conditions it is very difficult to accurately predict who will reoffend. For every person correctly flagged as a recidivist based on a high Static-99R score, at least three non-recidivists will be falsely flagged, according to research by Jay Singh and others, as well as published error-rate calculations by forensic psychologists Gregory DeClue and Terence Campbell.

Finally, and perhaps most importantly, telling a judge or jury how an offender compares with other offenders does not provide meaningful information about the offender’s actual risk. Indeed, such testimony can be highly misleading. For example, told that "Mr. Smith scored in the 97th percentile," judges and jurors may understandably believe this to be an estimate of actual risk, when the less frightening reality is that the person's odds of reoffending are far, far lower (probably no greater than 16 percent), even if he scores in the high-risk range. Seeing such statements in reports always flashes me back to a slim little treatise that was required reading in journalism school, How to Lie With Statistics.

Rather, what the trier of fact needs is a well calibrated test, such that predicted probabilities of recidivism match up with actual observed risk. The newly developed MnSOST-3 is promising in that regard, at least for offenders in Minnesota, where it was developed. In contrast, the popular Static-99 tools have always overestimated risk.

When the Static-99 premiered, it featured a single table of misleadingly precise risk figures. High scorers were predicted to reoffend at a rate of 52 percent after 15 years, which made it easy for government evaluators to testify that an offender with a high score met the legal criteria required for civil commitment of being "likely" to reoffend.

The instrument’s developers now admit that this original risk table "turned out to be a gross simplification."

Indeed, with each of a series of new iterations over the past few years, the Static-99's absolute risk estimates have progressively declined, such that it would be difficult for the instrument to show high enough risk to support civil detention in most cases. However, in 2009 the developers introduced a new method that can artificially inflate risk levels by comparing an offender not to the instrument's aggregate norms, but to a specially created "high risk" subsample (or "reference group") with unusually high recidivism rates.

Some evaluators are using this method on any offender who is referred for possible civil commitment. For example, I was just reviewing the transcript of a government expert's testimony that he uses these special high-risk norms on offenders who are referred for "an administrative or judicial process." In some cases, this amounts to heaping prejudice upon prejudice. Let's suppose that an offender is referred in a biased manner, due to his race or sexual orientation (something that happens far more often than you might think, and will be the topic of a future blog post). Next, based solely on this referral, this individual's risk level is calculated using recidivism rates that are guaranteed to elevate his risk as compared with other, run-of-the-mill offenders. This method has not been peer reviewed or published, and there is no evidence to support its reliability or validity. Thus, it essentially amounts to the claim that the offender in question is at an especially high risk as compared with other offenders, just "because I (or we) say so."

The admission of poor stability across samples should make it more difficult to claim that this untested procedure -- which assumes some level of commonality between the selected reference group and the individual being assessed -- is sufficiently accurate for use in legal proceedings. Given some of the sketchy practices being employed in court, however, I am skeptical that this practice will be abandoned in the immediate future.

The article is: "Absolute recidivism rates predicted by Static-99R and Static-2002R sex offender risk assessment tools vary across samples: A meta-analysis" by Leslie Helmus, R. Karl Hanson, David Thornton, Kelly M. Babchishin and Andrew J. R. Harris. Click HERE to request a copy from Dr. Hanson.

August 2, 2012

Violence risk instruments overpredicting danger

Tools better at screening for low risk than pinpointing high risk

The team of Seena Fazel and Jay Singh are at it again, bringing us yet another gigantic review of studies on the accuracy of the most widely used instruments for assessing risk of violence and sexual recidivism.

This time, the prolific researchers -- joined by UK statistician Helen Doll and Swedish professor Martin Grann -- report on a total of 73 research samples comprising 24,847 people from 13 countries. Cumulatively, the samples had a high base rate of reoffense, with almost one in four reoffending over an average of about four years.

Bottom line: Risk assessment instruments are fairly good at identifying low risk individuals, but their high rates of false positives -- people falsely flagged as recidivists -- make them inappropriate “as sole determinants of detention, sentencing, and release.”

In all, about four out of ten of those individuals judged to be at moderate to high risk of future violence went on to violently offend. Prediction of sexual reoffense was even poorer, with less than one out of four of those judged to be at moderate to high risk going on to sexually offend. In samples with lower base rates, the researchers pointed out, predictive accuracy will be even poorer.

What that means, in practical terms, is that to stop one person who will go on to become violent again in the future, society must lock up at minimum one person who will NOT; for sex offenders, at least three non-recidivists must be detained for every recidivist. This, of course, is problematic from a human rights standpoint.

Another key finding that goes against conventional wisdom was that actuarial instruments that focus on historical risk factors perform no better than tools based on clinical judgment, a finding contrary to some previous review.

The researchers included the nine most commonly used risk assessment tools, out of the many dozens that have now been developed around the world:

Level of Service Inventory-Revised (LSI-R)
Psychopathy Checklist-Revised (PCL-R)
Sex Offender Risk Appraisal Guide (SORAG)
Static-99
Violence Risk Appraisal Guide (VRAG)
Historical, Clinical, Risk management-20 (HCR-20)
Sexual Violence Risk-20 (SVR-20)
Spousal Assault Risk Assessment (SARA)
Structured Assessment of Violence Risk in Youth (SAVRY)

Team leader Fazel, of Oxford University, and colleagues stressed several key implications of their findings:

One implication of these findings is that, even after 30 years of development, the view that violence, sexual, or criminal risk can be predicted in most cases is not evidence based. This message is important for the general public, media, and some administrations who may have unrealistic expectations of risk prediction for clinicians.

A second and related implication is that these tools are not sufficient on their own for the purposes of risk assessment. In some criminal justice systems, expert testimony commonly uses scores from these instruments in a simplistic way to estimate an individual’s risk of serious repeat offending. However, our review suggests that risk assessment tools in their current form can only be used to roughly classify individuals at the group level, and not to safely determine criminal prognosis in an individual case.

Finally, our review suggests that these instruments should be used differently. Since they had higher negative predictive values, one potential approach would be to use them to screen out low risk individuals. Researchers and policy makers could use the number safely discharged to determine the potential screening use of any particular tool, although its use could be limited for clinicians depending on the immediate and service consequences of false positives.

A further caveat is that specificities were not high -- therefore, although the decision maker can be confident that a person is truly low risk if screened out, when someone fails to be screened out as low risk, doctors cannot be certain that this person is not low risk. In other words, many individuals assessed as being at moderate or high risk could be, in fact, low risk.

The study, Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24,827 people: systematic review and meta-analysis, is published in the British Medical Journal and is freely available online (HERE).

My blog post on these researchers' previous meta-analytic study, Violence risk meta-meta: Instrument choice does matter, is HERE.