by Alexander Talbott

The United States leads the world with over 2.2 million people incarcerated, a 500 percent increase from 50 years ago. This rapid expansion of the criminal justice system is due to many factors, one of the most prominent being high recidivism rates. Across the country, many individuals are repeatedly interacting with the criminal justice system, often in the form of misdemeanors and other minor offenses. These individuals tend to be the most vulnerable in society: 64 percent suffer from mental illnesses, 10 percent experienced homelessness in the year before their arrest, and 55 percent struggled with substance abuse issues. For example, it was estimated there were three times as many individuals with mental illnesses in jail and prisons than in hospitals in 2005, and recidivism rates for the mentally ill are as high as 70 percent. Furthermore, many of these individuals’ first interactions with social services are within the criminal justice system, leading to a reactive system of last resort that is overburdened, expensive, and ineffective.

There are many possible alternatives to reducing recidivism. Researchers have found diversionary programs reduce jail time, improve participant outcomes, and reduce costs without increasing public risk. Monitored mental health treatment for inmates has also shown statistically significant reductions in recidivism. However, these solutions address the cycle of recidivism in a reactionary way, helping individuals after they are convicted to avoid future criminal behavior. A proactive approach is to intervene on at-risk individuals by offering the support of social services agencies, connecting them with the assistance they need before the onset of criminal behavior. However, social services agencies are often underfunded and must prioritize their help toward individuals most at risk of repeat offenses. Existing approaches to identifying these people include Criminal Justice Coordination Committees, groups composed of various local criminal justice, legal, and social services departments with the goal of increasing interagency communication, and the Sequential Intercept Model, a planning tool for communities to identify how at-risk individuals move through the criminal justice system. Nonetheless, what these tools lack is a way to quickly and accurately predict who the highest risk individuals in their community are.

Rayid Ghani, a professor in the Machine Learning Department at Carnegie Mellon University, is working on a new approach to solving this societal problem. In two studies, one conducted in Los Angeles County and the other in Johnson County, Kansas, Professor Ghani and his fellow researchers have built machine learning models that accurately predict individuals at risk of recidivism. This information allows social services departments to intervene before the cycle of recidivism takes hold and provide individuals with the help they need.

How Can Machine Learning Models Reduce Recidivism?

Broadly speaking, the research teams’ machine learning models are sophisticated algorithms that can predict when someone is at risk for recidivism. This knowledge is invaluable for a social services department looking to prioritize its limited resources towards helping the most at-risk individuals. The models give agencies the information to intervene through individualized interventions, including diversion programs, conditional plea agreements, or stayed sentencing. 

The models are binary classifiers, meaning they will output either a “yes” or “no” that the individual will be booked in the system in the next 6-12 months. The variables used to predict outcomes, known as features, are drawn from various sources, including previous criminal charges, prior court case outcomes, interactions with mental health services, interactions with emergency medical services, and demographic data like race, age, and gender.

In order to assess the accuracy of their models, the teams used the overall rate of recidivism as a baseline, the rationale being that a model that predicts at-risk individuals at random would achieve approximately the same accuracy rate as the current overall rate of recidivism. In contrast, the results of the models are quite promising. In the Los Angeles County model, 73 percent predicted to be at-risk were later involved in a booking, compared to the baseline rate of 4.4 percent. In Johnson County, that rate was 51 percent, compared to a baseline of about 25 percent. While these experiments are limited in scope, the results are significant and indicate that these models have the potential to be powerful tools in proactively addressing recidivism risk.

Predictive Modeling in Criminal Justice

Justifiably, many people have deep reservations about using machine learning or artificial intelligence in the criminal justice system. Tools like predictive policing and Clearview AI’s facial recognition software have been shown to be ineffective and harm minority groups. Similarly, algorithms used by courts that deny bail affect minorities at disproportionately high rates. 

This issue was particularly pertinent to Professor Ghani and the teams as they conducted their research. An algorithm can be used in a punitive manner, such as predictive policing or making a targeted arrest. On the other hand, a nearly identical algorithm can be employed to direct social service agencies to support those at risk for recidivism in an equitable manner. In these two cases, the model is essentially the same, but the action taken is very different; an algorithm is a tool that can be used or misused depending on who utilizes it. Professor Ghani described the models the teams created as “what predictive policing needs to be.” The teams acknowledge their models could be misused if in the wrong hands. Therefore, it is essential to work with trusted social services organizations and continuously monitor the models’ implementation.

Equity and Fairness

The most important part of any machine learning project is the method by which it is evaluated and what actions are taken based on that evaluation. This is especially important in the criminal justice system, as a biased or poorly evaluated tool can result in disastrous outcomes in an individual’s life. One example is COMPAS, a recidivism reduction algorithm used in Broward County, Florida, that ProPublica found to be deeply flawed in its evaluation methods. While it correctly predicted recidivist behaviour 60 percent of the time, researchers found it to be racially biased and inequitable. COMPAS misclassified Black defendants as a greater risk for recidivism at a higher rate than White defendants (45 percent to 23 percent), and it mistakenly labeled White defendants as lower risk at a higher rate than Black defendants (48 percent to 28 percent). These biased results could lead to longer sentences, higher bail, or denial of pretrial release for lower-risk, predominantly Black individuals. COMPAS illustrates the point that evaluating a machine learning model’s performance cannot be based solely on accuracy alone. So, what constitutes a model with “good” performance? 

Walber, CC BY-SA 4.0 via Wikimedia Commons


According to Professor Ghani, the answer to that question depends on the project’s goal. He explained “performance” in machine learning is a vague term that can have different meanings in different projects. In many machine learning projects, statistical metrics such as precision – the ratio of correct predictions to all predictions – and recall – the ratio of correct predictions to all truly correct observations – are used to measure performance. Conversely, the goal of the recidivism reduction models was to predict individuals at risk for recidivism in the most equitable way possible. This meant equity was to be the main performance metric, taking precedence over precision, recall, or other statistical metrics. 

To achieve their goal of maximizing equity, the teams decided to balance recall across all racial demographics. Professor Ghani explained that by focusing on recall, fewer at-risk individuals would be left out, and by balancing the metric across racial groups, the model would be more racially equitable. He also explained that achieving the same recall ratio in each group is not enough to erase previously existing racial skew in recidivism. Instead, it would be necessary to put more emphasis on racial groups that are overrepresented. For example, the teams noted the recall for Hispanic individuals was much lower than other ethnicities, meaning the model was selecting a relatively low proportion of at-risk Hispanic individuals. At the same time, Hispanic individuals are overall more likely to be at risk for recidivism. Therefore, the researchers calibrated the models to ensure Hispanic individuals are chosen at a slightly higher rate to help eliminate this racial disparity over time and ensure an equitable model.

Rodolfa, et. al.

The left chart shows recall by race without taking equity into account. Hispanic and Race Unknown groups are underrepresented. The middle chart shows recall balanced evenly across races. The right chart shows the model that was chosen where recall is balanced to be proportional to a group’s prevalence in recidivism.


Recidivism is an issue that affects the most vulnerable and is costly to society. Professor Ghani and other researchers are approaching the challenge of recidivism with machine learning and other predictive tools. Nonetheless, these models are just the start. Professor Ghani described the need for a common framework put forward by the government for designing equitable data-driven tools in government. This framework would define goals, regulations, and methodologies that anyone, not just machine learning researchers, can understand. He also emphasized the need for well-defined guidelines that will allow policymakers to evaluate the ethics of data-driven policy tools in the future. Finally, data-driven policy systems such as the recidivism reduction models need to be made scalable. While the models described above are effective as proofs of concept, they only affect two counties and will likely need to be tailored to fit other local jurisdictions’ needs. With these steps in place, the rethinking of how criminality is addressed in America can continue, and those who are stuck in the cycle of recidivism can receive help from the system that serves to protect them.

About the author:
Alex Talbott is a first-year Master’s student in Public Policy and Management on the Data Analytics track at Heinz. He is interested in policies that promote the ethical use of artificial intelligence and related technologies as a tool for doing good.


Leave a Comment