Big Data could be viewed as a great fit for the courtroom. The legal system is built on rules derived from cases decided in the past and whose precedence is still honored today. Artificial Intelligence and machine learning systems are trained along the same lines as they distill patterns from the mass of data and organize information for human consumption.

In the United States, some jurisdictions are using risk assessment algorithms to predict whether a particular convict is likely to commit another crime. Judges were more likely to hand out heavier sentences to convicts singled out by these algorithms.

A constant criticism of these algorithms is that they operate like a black box. There is no explanation for the scores it attributes to individual cases. Many view this lack of transparency as a violation of the prisoner’s rights.

The Big Question: Do Algorithms Do Better Than Humans In Predicting Recidivism?

A pioneering study by two Dartmouth College researchers called into question the efficacy of risk assessment algorithms when their research revealed that a popular risk assessment algorithm called COMPAS performed worse than a random online poll of untrained people in predicting recidivism.

To verify the results of their study, the Dartmouth researchers built their own risk assessment algorithm and came to the conclusion that there were only 2 data points needed (defendant’s age and number of prior convictions) to match the 65% success rate of COMPAS (which supposedly crunches 137 data points for greater accuracy).

From Wired:

Using Amazon Mechanical Turk, an online marketplace where people get paid small amounts to complete simple tasks, the researchers asked about 400 participants to decide whether a given defendant was likely to reoffend based on just seven pieces of data, not including that person’s race. The sample included 1,000 real defendants from Broward County, because ProPublica had already made its data on those people, as well as information on whether they did in fact reoffend, public.

They divided the participants into groups, so that each turk assessed 50 defendants, and gave the following brief description:

The defendant is a [SEX] aged [AGE]. They have been charged with: [CRIME CHARGE]. This crime is classified as a [CRIMI- NAL DEGREE]. They have been convicted of [NON-JUVENILE PRIOR COUNT] prior crimes. They have [JUVENILE- FELONY COUNT] juvenile felony charges and [JUVENILE-MISDEMEANOR COUNT] juvenile misdemeanor charges on their record.

[…]

Overall, the turks predicted recidivism with 67 percent accuracy, compared to Compas’ 65 percent. Even without access to a defendant’s race, they also incorrectly predicted that black defendants would reoffend more often than they incorrectly predicted white defendants would reoffend, known as a false positive rate. That indicates that even when racial data isn’t available, certain data points—like number of convictions—can become proxies for race, a central issue with eradicating bias in these algorithms.

What all these means is that there is a great need for more transparency in the creation and deployment of algorithms that affect the public welfare. Similarly, the developers of these algorithms must be held accountable for the performance of their software. Government agencies must also have a process for verifying the effectiveness and fairness of the algorithms used in the public space.

Read the full research paper here.