12 Dec 2016

Judged by AI

Researchers at University College London, University of Sheffield, and the University of Pennsylvania, Philadelphia have presented the first systematic study on predicting the outcomes of cases tried at the European Court of Human Rights (ECtHR) using Artificial Intelligence.


Utilising recent advances in Natural Language Processing (NLP), combined with Machine Learning, predictive models are able to establish the patterns behind judicial decisions. Recognition of these patterns is a useful tool for predicting the future outcomes of such cases. The AI was able to predict the outcomes of 584 cases relating to violations of articles of the Convention, with 79% accuracy.

Binary Classification Model

The focus of the study is on the automatic analysis of ECtHR cases, ruling whether a particular article of the European Convention of Human Rights (ECHR or Convention) was violated. In total, 584 cases were selected, relating to potential violations of three articles of the Convention:

  • Article 3, which prohibits torture and inhuman and degrading treatment.
  • Article 6, which protects the right to a fair trial.
  • Article 8, which provides a right to respect for “one’s private and family life, his home and his correspondences”.

These three articles provided the most data that was easily accessible, as well as providing enough in terms of numbers to suitably test the models. Article 3 cases totalled 250; Article 6 totalled 80; and Article 8 totalled 254. These numbers were then split, so that there was an even number of “violation” and “non-violation” cases for each Article.

Each case was broken down into several sections:

  • Procedure, as followed before the court, from the original application to the judgment being handed down.
  • The Facts, defined as all material that is not considered legal arguments.

The Facts subdivides as follows:

  • Circumstances, the factual background of the case
  • Relevant law, or the legal provisions other than the articles of the Convention that are relevant to deciding the case.
  • The law, or legal arguments.

Using N-grams to extract the textual content of the cases, specific Topics for each article were created by clustering N-grams that are semantically similar by leveraging the distributional hypothesis of similar words appearing in similar contexts. The data being used defines it as a binary model, and, using the textual features (the N-grams and topics) to train Support Vector Machine (SVM) classifiers. 

A linear kernel was applied, which allows identification of features of the cases that have particular weight towards being a violation or not. Violation cases are labelled as +1, non-violation as -1, meaning that a positive weight indicates a violation and vice versa.

Results and Accuracy

Both N-gram and Topic features of the cases performed well when tested, indicating that language use and topicality are significant indicators of judicial decisions. An overall accuracy rating of 79% emerged for the model, while Circumstances proved the most accurate for cases in Articles 6 and 8, with 82% and 77% respectively. In Article 3, the Full case (i.e. everything contained in all the sections of the case) provided the best accuracy of 70%, with Circumstances not far behind on 68%. 

What this table shows is how subsections like Procedure, which contain general information about the applicants, have performed less well than subsections like Circumstances, which has a strong factual background, and demonstrates that the factual text is the most important feature. It is important to note that the Law subsection performs so poorly is partly due to a large number of cases not carrying a Law subsection, due to cases that have been deemed inadmissible, leading to judgments of Violation.

Data Limitations

One of the challenges faced so far, according to Dr Nikolaos Aletras, who led the study at UCL Computer Science, is providing a qualitative interpretation of the results that can be useful for engaging with questions pertaining to legal scholars within the methodological confines of the approach used. “Another challenge in building AI systems to study judicial decisions as well as to assist lawyers and judges in prioritising cases is the data availability. Making data sets publicly available would enhance further research on the intersection between legal science and artificial intelligence.”

Moving forward, they are keen to expand the dataset both in terms of judgments by the ECtHR and in terms of texts relied upon to make predictions (applications/briefs and motions by parties).

Could this method be used in other areas of law? Cases relating to violations of Convention articles give rise to the binary classification model. For other types of law, depending on the nature of the options available to judge for a given case, other methods can be used. “If the judge has more than two options, then other AI methods such as multi-class classification algorithms would be more suitable,” says Aletras.

Dr Vasileios Lampos, also at UCL, says, “I believe that more advanced statistical Natural Language Processing (or AI for simplicity) methods will be required in order to understand and model more complex structures in the data, and hence, be in the position to address harder tasks than the binary classification problem presented in our paper. I think that the European Court of Human Rights was probably the best case study for a binary classification task, given the hard binary nature of its decisions.”

On the subject of other potential areas of law to test it on, Aletras says, “There are a number of things to note here. First, both critics and proponents have to substantiate their claims by engaging in further empirical analysis, which is ultimately the only way to establish whether AI can effectively grapple with more complex areas of law. Second, as already noted, our algorithm is mainly useful for predictions of judgments when the outcome is binary, which is not always the case in other areas. Third, the algorithm extrapolates from past results by spotting patterns between textual features and outcomes. In this sense, it is 'conservative', meaning that it is not able to predict conceptual/doctrinal invention of new concepts. “

Future Application

This tool is not yet able to concern itself with outcomes and not at all with regards to the social function of what it means to be a judge. “In fact, one of the most important functions of judges is not only to decide cases but also to provide extensive reasons as to why cases were decided in a specific way. Our algorithm cannot do that and, to this extent, is constitutionally unable to emulate a judge. We believe that AI systems can have an assisting or analytical role in the legal domain, e.g. in our case for prioritising cases that are most likely to violate an article of the human rights Convention,” says Aletras.

Will there be a future need for lawyers and judges to fear for, or at least, change their jobs? “While it would be premature to say that lawyers and judges should fear for their jobs, it is important to stress that the more statistical NLP approaches to legal texts become sophisticated and widely available, the more they will tend to procure all kinds of advantages to those that effectively use them,” states Aletras. “For example, our algorithm not only predicts outcomes with a certain reliability but also can spot doctrinal patterns for thousands of cases and ranks textual features as to their importance. At least in this sense, it can be a particularly useful assisting tool for judges, lawyers and legal scholars.”

related topics