Judges vs Algorithms.

In an ideal world, the criminal justice system would convict defendants with similar offenses and criminal histories in the same way.

It would be absurd if similar cases yielded wildly different results simply due to chance or an arbitrary decision. But we must understand that whenever judgment is involved, there will be noise.

And by noise, I mean that there will be a difference and end results based on who does the judging.

Judges around the world have a lot of freedom to decide what sentence is suitable for someone who is found guilty of a crime. This is for good reason.

We must not forget that criminals are human too. In fact, society’s treatment of convicted criminals reflects its values. No one is exempt from the law; individuals convicted of serious crimes have lost their freedom, and the everyday pleasures we take for granted, and have been cast out by society with their reputations in ruins.

The criminal justice system can keep individuals in jail for weeks, months, or even years before they are found guilty. During this time, they may endure inhumane conditions, such as only being allowed one hour outside of their cell each day.

Judges, who have witnessed the entire case from beginning to end, are the most suitable people to evaluate all the mitigating and aggravating factors. They can comprehend the true nature and conditions that caused the defendant to commit the crime.

In theory, this sounds great and just.

The defendant has the right to be heard and judged based on their individual case. However, an analysis of millions of criminal court cases over the years paints a different picture.

When reviewing the data, it is as if a Random number generator is being used to create sentences. Remarkably similar cases end up with wildly different outcomes.


There is an anecdote that showcases this insanity. Two men, with no prior criminal records, were convicted for cashing counterfeit checks of $58.40 and $35.20. Astonishingly, the first man was sentenced to 15 years, while the second got only 30 days. The difference of $23.20 could not justify the huge disparity in their sentences.

This pattern repeats itself not only when reviewing underlying data, but also in studies asking judges to pass sentence on hypothetical cases. In many studies, judges show huge variance in opinion, and often fail to agree on basic outcomes, such as the appropriateness of incarceration.

Millions of cases demonstrate that even the most unexpected factors can have a huge impact on results. These include the time of day, temperature, the defendant’s race, the judge’s gender, and even if the local sports team won a game that weekend.

These insights are not novel. In 1973, Judge Frankel drew public attention to this. In the late 1970s, William Austin and Thomas Williams conducted further studies that further confirmed this notion.

And so the question is, what can we do about it?

When this was discovered there wasn’t much we could do with technology. Processing power was insufficient and artificial intelligence was in its early stages.

In 1984, the US Congress passed the Sentencing Reform Act to reduce the disparity in sentences given by different judges. This law created a Sentencing Commission whose purpose was to create mandatory guidelines that would limit the range of criminal sentences.

Judges could break these ranges, but it would automatically trigger a review by an appeals court.

Let’s take a step back.

What has to be reviewed in order to reach a sentence in a criminal case? Well, there are only two things to consider when looking at a criminal case.

The first is the crime itself and the results it caused. What did the defendant do wrong? What was the defendant thinking at the time? Was the defendant in full control of their mental capacity?

The second factor to consider is the criminal history of the defendant. Most countries are more lenient to first-time offenders than repeat offenders. It is understandable why. Everyone can make a mistake, but it is unlikely that a career criminal will change their ways when they continue to commit crimes. The cost to society of career criminals is immense, so we want to discourage it with harsh sentences or even “throwing away the key”.

We may assume it is easy to create sentencing guidelines based on the objective fact of a guilty act and the defendant’s criminal history. However, it is challenging to comprehend the Mens Rea, or the intent behind the act and any partial defenses.

We expect the sentencing range to be small if the defendant has been proven guilty of a crime. Establishing the guilt must have already taken place.

Judges would still have some leeway to make individual decisions, but this would reduce the system’s overall noise and variance.

Deciding on an appropriate sentence for a particular crime is complex. If we focus solely on the historical data of sentences given for that crime, we risk replicating the same bias we are trying to eliminate. This bias may be related to the socioeconomic or racial group of the offender. In this case, relying on data alone will not resolve the issue, but rather make it worse.

It is hard to determine the sentencing range for different crimes using only a rational approach. No clear consensus exists on what an appropriate punishment should be for each crime.

Let’s review the difference in how many European and Asian countries deal with the issue of drug smuggling.

In Europe, drug smugglers caught with hard drugs, like heroin, can get prison sentences of up to 15 years, but usually less. In many Asian countries, however, the punishment for smuggling large amounts of hard drugs is death.

Why such a huge difference?

Most murderers in Asian countries don’t get the death penalty because they can plead partial defence and get a multi-decade sentence or life imprisonment. But drug smugglers get harsher sentences than murderers. This is because drug smuggling is seen as worse than murder in Asia. A murderer might kill one or a few people, while a drug smuggler could ruin hundreds of families and individuals.

It is hard to reconcile the two conflicting views. Which is worse: killing one person directly or smuggling drugs that can cause the death of many and disrupt social order?

An alternative, though impractical, approach to this problem would be to have multiple judges review the case and pass independent sentences. Then, the average of those sentences would be the final sentence. This is impractical because most criminal justice systems are already overstretched. To create a meaningful average, we would need at least three, and perhaps even five, judges. Even if not every judge has to sit through the entire case, it would still significantly increase the workload on an already strained system with limited resources.

We are carefully examining if artificial intelligence and algorithms can be used to determine sentences. However, this is not a perfect solution. Artificial intelligent systems often employ machine learning. But the data for these systems comes from previous cases, which raises the same issue of creating sentencing guidelines based on prior data — we will codify existing biases.

This is a known fact: men are overwhelmingly represented in STEM subjects, and middle classes are overrepresented in top universities. These universities likely produce individuals who work on advanced and complex A.I. systems.

A small minority of society, an intellectual elite, is then tasked to build a system that will judge everyone.

There’s also the issue that many of these systems eventually grow and learn by themselves and become so complex no single human being understand how it works. This is a massive problem because it would mean that the logic of sentencing may become obscure, and then we start treating criminals like cogs in a machine.

Any sentence, regardless of how far it is, can be interpreted as unfair if its reasoning is unclear.

We should not focus on which system is better. Instead, we should ask which system produces the least bad results. In other words, what is the least worst alternative? Should we continue with the current system, where judges have considerable freedom in sentencing and must follow either mandatory or recommended sentencing guidelines? We are aware of the many biases and issues with the current system, but at least we know what it is.

I worry that if we attempt to move toward algorithmic sentencing, we will be unable to turn back. We will have to accept whatever the machine produces without hesitation.

An algorithm for an AI system could suggest a sentence for a panel of judges to review. This would give judges an extra data point to consider, potentially leading to a better outcome.

This is similar to how humans assisted by chess engines play better chess than chess engines by themselves. Because while chess engines have incredible calculation abilities, they can also miss obvious (to humans) long-term strategies that are impossible to calculate in the short term.

In the same way, artificial intelligent systems may struggle to understand when a sentence is obviously inhumane, while we can at least hope for some humanity from judges.

