Picture credit: Aitor Diago/Getty
Artillery Row

Amicus curAI?

The implementation of AI into the judicial process must be handled with care

“All devices have their dangers. The discovery of speech introduced communication—and lies. The discovery of fire introduced cooking—and arson… The automobile is marvellously useful—and kills Americans by the tens of thousands each year. Medical advances have saved lives by the millions—and intensified the population explosion.” — Asimov, Robot Visions

All technological advances have their unforeseen consequences. When it comes to Artificial Intelligence (AI), those consequences are often the entire focus of discussion. At a March 2024 conference on AI’s future role in the UK’s legal system, Sir Geoffrey Vos, Master of the Rolls, implored the UK’s judiciary to become “just as familiar with the use of AI as any lawyer”. Just what level of familiarity that is, or why it would be necessary, raises more questions than it answers. Vos may also be asking a bit much, after all, HMCTS only began familiarising itself with video hearings in 2020. The judiciary may be a little late in their techno-optimism, at the time of writing, there is a growing distrust of the AI boom which, in recent years, has helped cement the unassailable position of theMagnificent Seven” in the U.S stock market. If the judiciary is courting such technologies, it is worth evaluating whether they are capable of threatening our justice system, and further, do they provide opportunities in proportion to the claims? More fundamentally, what is AI?

What precisely is AI?

The UK’s Guidance for Judicial Office Holders on AI was published on December 12 2023. Spanning only 6 pages it is a text which is proportional to AI’s usage by the UK’s judiciary, which remains low. The Guidance defines AI broadly as “computer systems able to perform tasks normally requiring human intelligence” — a definition so broad it may just as well include the Casio pocket calculator, or any modern search engine. IBM, the pioneers of early supercomputers such as Deep Blue (which beat world Chess Champion Gary Kasparov in 1996), defines AI as “technology that enables computers and machines to simulate human intelligence and problem-solving capabilities”. This better captures what so-called AI products accomplish: they simulate human intelligence and problem-solving, artificially. They are not, as the Judicial Guidance seems to suggest, computer systems which have removed the requirement for any input of “human intelligence”; instead, they mimic human problem solving. Even the most responsive AI chatbots require human input, known as prompts.

AI, as currently understood by consumers, is little more than a complex search engine capable of presenting its search results in a long-form, human-like manner. Open AI, the pioneers behind Chat GPT, announced their search oriented wrapper of their model on 25th July 2024. Generative AI chatbots, and the Large Language Models (LLMs) which live in their back-ends, have been trained to produce results, lists and solutions to questions in the manner that is formally similar to a Reddit or Quora user. They have been trained on real responses, written by such users. The crucial difference between Chat GPT 4 and a user on Quora, or Reddit, is that AI products are more likely to be popularly regarded as competent. However, research from Purdue University suggests that “52% of ChatGPT answers contain incorrect information and 77% are verbose”. Thus your query is likely to be answered incorrectly half of the time, but you have a much higher chance of believing an AI model when it is wrong — 39 per cent of individuals in a study overlooked factual errors, due to verbose presentation. AI may therefore be uniquely unsuited for involvement in the judging process, due to its tendency to beguile the reader through sheer verbosity. 

AI alarmism, done for financial gain?

The tendency for AI’s capabilities to be over-emphasised, arguably to inflate share prices, should be a key factor when considering if it has a role to play in the judiciary. Many of the world’s major financial institutions have suggested that such exaggeration may have occurred in Q3 of 2024. Investors have reduced holdings in AI assets, citing it lacks “use case” and also that big tech companies have routinely misallocated capital in the quantity and manner they build AI infrastructure; something forewarned by technology venture capital firm Sequoia Capital in September 2023.

For the period of 2020-2024, AI accelerationists and those in favour of regulating AI have been keen to overemphasise the capabilities of these technologies. The UK’s Civil Service, and judiciary have listened. That said, Silicon Valley incubator Y Combinator boasts 97 active AI startups, only two of which mention legal applications. Venture capitalist and cofounder of Andreessen Horowitz, Marc Andreessen, presented his dizzying fantasies about AI teaching children, therapising adults and unravelling our universe in June 2023; stopping to mention law merely in the context of a regulatory obstacle. Perhaps in response, AI regulation advocates, best represented by the EU Commission, have stated that AI usage in critical infrastructure, including the administration of justice represent the highest danger in their risk-based approach.

A cult-like status has developed around advocates for the uptake of AI. Professor Klaus Schwab, Chairman of the World Economic Forum, has said that AI demonstrates “the greatest possible benefit to serve humanity” in what he describes as a Fourth Industrial Revolution. Extropic’s “Beff Jezos” has astroturfed an entire social movement to similar ends, encouraging undergraduate age tech-enthusiasts to become cheerleaders in a series of funding rounds (he described this as “not a cult”).

English philosopher Nick Land, dubbed the “father of accelerationism”, has shown a more cautious optimism, suggesting that “all the financial markets, corporate planning processes, engineering methods, and science fiction progress narratives” are a form of artificial intelligence already; ones simply running on human hardware. In 2014, Elon Musk, in a more alarmist spirit, stated that “with artificial intelligence, we are summoning the demon”.  AI alarmism has proven profitable; Musk now deals in some of its most sophisticated uses, in his pursuit of driverless vehicles, and oversees a more familiar AI chatbot by the name of Grok, embedded in Twitter (Now X.com).

I opened this article with a quotation on the risks associated with disruptive technologies, using the words of sci-fi author Isaac Asimov. This is because the imagined capabilities of AI, spoken of by both the pro-regulatory camp, and the accelerationist VC’s of Silicon Valley, have one thing in common: they are the realm of science fiction, pulled from the rich cultural mineshafts of The Terminator (AI in defence) to Minority Report (predictive AI in justice). Companies from Palantir, to the now-defunct CaseCrunch, relied heavily on this wellspring of promise for investment. Cynically, the AI trend may be viewed as an attempt to catch the attention of the investing world, who are now showing caution. The judiciary of the UK should take heed.

What can AI actually do for the law?

In 1996 IBM’s Deep Blue supercomputer defeated chess grandmaster Garry Kasparov. A similar battle between lawyers and the CaseCrunch AI company took place in 2017. Like Deep Blue, today’s AI products are suited to being trained on well-trodden tasks like chess, for which there is a glut of available data and predictable outcomes. In CaseCrunch’s case, this was Personal Protection Insurance (“PPI”) complaints made to the Financial Ombudsman. 100 lawyers, mainly solicitors from commercial law firms, were given an unlimited amount of time to decide whether (previously decided) PPI complaints were upheld or rejected by the Ombudsman. Over 775 predictions, the CaseCrunch system had an accuracy rate of 86.6 per cent, while the lawyers had an accuracy rate of 62.3 per cent.

While the performance of CaseCrunch clearly exceeded that of the lawyers, we must analyse the results in context. First, much like Deep Blue’s match with Kasparov, this was a publicity exercise for CaseCrunch. Secondly, Kasparov and Deep Blue both played chess; commercial solicitors’ firms do not typically deal with Ombudsman complaints, the financial Ombudsman service does. CaseCrunch did, however, demonstrate AI accuracy when predicting outcomes involving settled law and a limited scope of evidence. Essentially, it proved it was accurate at screening facts, not at judging law. If properly trained, such products could ease caseloads in the lower courts through similar screening, freeing up resources better used to address the 350,000 outstanding cases in Magistrates’ Courts alone, as of October 2023. Similarly, backlogs of non-court complaints could be reduced, such as the fitness to practise backlogs experienced by the HCPC, NMC among others. Much like any supermarket self-checkout area, this should be accompanied by hawkish human oversight, meaning that AI would serve merely as a force multiplication tool for human decision makers. 

For barristers, the prospect of replacement by AI is even more remote. Early in March 2023, OpenAI announced that its Chat GPT4 LLM scored in the 90th percentile on the New York State Bar exams in under six minutes. In the spirit of overestimating AI, in 2024, it was revealed that this had been overstated — this was only when compared to repeat test takers. GPT4 had in fact scored “In the 69th percentile of all test takers and in the 48th percentile of those taking the test for the first time”. In the essay-writing portion of the examinations, GPT 4 was placed in the 48th percentile of all test takers and in the 15th percentile of those taking the test for the first time. Embarrassingly poor results for AI, given the on-rails, highly standardised nature of such essays, and given the Open AI team had access to ample training data for the relevant examinations. Essentially, AI is incapable of writing competently — much less competing with the factually novel opinion writing work typical of the Bar. 

Given that AI such as GPT 4 is best suited to multiple choice examination, there is a case for the use of AI products in populating similarly formulaic procedural forms. The same may be true of pre-action procedure, reducing costs for both litigants representing themselves and barristers alike. Current Judicial guidance agrees that AI is best suited to “administrative tasks like composing emails and memoranda…[and]…summarising large bodies of text”, though not judgments, given consumer AI products are incapable of distinguishing obiter from ratio comments of judges. Human oversight would still be needed: the key risk here is outlined in the Judicial Guidance, namely that AI products frequently “make up fictitious cases, citations or quotes, or refer to legislation, articles or legal texts that do not exist”. For barristers, overreliance on AI risks falling foul of their duty not to mislead the court. For litigants in person, this risks misleading themselves into pursuing litigation. Essentially, while AI products designed for the legal market may someday take on the role of paralegals, for now, they are by no means as reliable.

What shouldn’t AI do for the judiciary?

While effective in predicting the prospect of success in matters of settled law and limited fact, as I see it, AI systems have no place in predicting novel claims or test cases. This is because AI lacks the capacity to make such decisions, for two reasons. Firstly, a judge has a right to use their personal knowledge, properly applied, and within reasonable limits, on matters which are within the common knowledge, as in Reynolds v Llanelly Associated Tinplate Co Ltd [1948] 1 All ER 140. The principle in Reynolds may remain relevant in relation to AI’s ability to draw upon knowledge outside the common knowledge in its decision making; this can never be ruled out without fully transparent systems, thus adding routes for appeal as seen in Rollinson and the need to technically audit decisions — a task the judiciary cannot afford to incur presently.

Secondly, and more fundamentally, only humans should be permitted to shape the law to which those in England & Wales are subject. Two of the four clauses of Magna Carta, which remain valid today, forcefully explain why. Clause 39 states that “no free man shall be seized, imprisoned, dispossessed, outlawed, exiled or ruined in any way” but by “lawful judgement of his peers and the law of the land”. An AI product can hardly be regarded as a peer. 

Clause 40 goes further, signalling why reliance on such for-pay products have no place in our legal system; it states “To no one will we sell, to no one will we deny or delay right or justice”. A judicial reliance on AI licences would contradict one of our most fundamental constitutional principles by de-facto privatising justice.

In 1539, Richard Moryson compared the role of the lawyer to an “English tailor [making] of Italian velvet, an English gown”. The idea of law being properly fitted to the nation, by its own people, has hung heavy in discussions surrounding retained E.U law, following the end of the transition period in 2020. If we have chosen to reject the law of the continent, why would we allow English common law to be shaped by machines unfamiliar with our customs — or more accurately, with the customs of being human at all. 

Importing law & amplifying prejudices 

AI decision making does more than risk importing knowledge beyond the scope of what is appropriate to draw upon. Judicial Guidance rightly suggests that AI struggles in distinguishing which jurisdiction it is in, pointing to its tendency to use “American spelling”, “overseas cases” and to espouse views “based heavily on US law”. The manner in which AI models like Chat GPT 4 are trained, and the emphasis on North American data that it is trained on, risks importing views typical of U.S law to our shores. 

This is not only an issue of principle, but an issue of practice too. If US-derived training data contains existing biases, AI systems reinforce these. It thus becomes a gateway to the worst practices of judicial decision making. In sentencing, for example, it was found in the US that judges relying on AI in determining sentences, handed down “markedly less jail time in tens of thousands of cases, but also appeared to discriminate against Black offenders despite the algorithms’ promised objectivity”. It is possible the judiciary would import a preference for “lengthy sentencing of minorities”, at a commensurate rate to the US, under the guise of machine-neutrality.

Such amplification of human bias is apparent in AI’s application in facial ID, which the present government appears keen to deploy. Researchers at MIT & Stanford Universities identified that 97 per cent of the time, AI could identify white faces. However, 20 per cent of the time in the instance of black women, it identified the wrong person entirely. 

When it comes to absorbing cultural differences baked into training data, we might draw comparisons with the European Court of Human Rights (ECtHR). The judgment in Tyrer v United Kingdom controversially attempted to bring UK law into line with “commonly accepted standards in the penal policy of member states”. The same homogenising effect is true of any AI trained on foreign data. Lord Hoffman criticised the ECHR’s failure to understand the “nuances of the domestic laws of Member States.”  If our European neighbours cannot understand the spirit of domestic law to a satisfactory level, it is doubtful a machine would be more successful.

Plateauing capabilities

Looking at the prospects of AI’s improvement in the coming decades, requires examining the limitations on its development today. AI’s capabilities have plateaued almost as quickly as they accelerated. Francesco Federico, non executive director at S&P global neatly summarises why this is so. Federico suggests that the development of AI, especially large language models (LLMs), has been driven by the vast seas of data harvested from the internet. Essentially, AI models have been trained on your data, and that of millions of Reddit and Quora users. Google’s AI model has had the advantage of a huge historical database of user inputs from the most popular search engine in the world.

The bottleneck on training is clear: accessibility of your data. As regulatory oversight prepares to evaluate AI and sizable AI developers like Alphabet and Microsoft exhaust the available data they can legally scrape, producers of these exciting products face a war on two fronts. While cases from Prismall v Deepmind, Lloyd v Google and SCHEMS I and II have insulated data collectors from class actions in the EU and UK,  the EU Commission appears eager to usurp that role, establishing an AI office specifically dedicated to data safety.

The importance of regulatory oversight is generally shared; a 2023 survey found 76 per cent of the UK believe AI should be independently regulated, while 62 per cent believe the government should perform this task. The same survey found that Canada, France, Germany, Japan, the UK and the US all believed AI came with more risks than benefits. In effect, the same catastrophising hype which has driven the AI boom for the Googles and Microsofts is likely to bring about its end, as their exploitation of user data is further regulated.

AI will likely become less prepared in the face of unpredictably, more unreliable in its solutions and less creative

The tech industry’s response to this limitation on data has been to create what is known as Synthetic Data, using AI systems, rather than data derived from human activity. As one would expect, using such data comes with issues. First, it lacks the nuance of real-world data, making LLMs trained on it ill-equipped for the unpredictable. One might compare it to the way political echo chambers form in humans. Second, this amplifies biases, essentially making AI’s trained on it wrong more frequently. Third, synthetic data lacks human creativity — it rarely surprises you. In short, AI will likely become less prepared in the face of unpredictably, more unreliable in its solutions and less creative — qualities undesirable for judicial officeholders and barristers.

Summing up

In 2006 Lord Bingham set out his first principle on the rule of law, since invoked by the House of Lords: “The law must be accessible and so far as possible intelligible, clear and predictable”. At present, however, the functions of AI in the process of judging do not appear clear to even the judiciary itself, much less the wider public. What is clear is that the hype is waning. AI may hold promise in the screening of cases, yet its current capabilities are overstated and its development in coming decades will be constrained by practical and regulatory challenges. 2023’s Judicial Guidance shows key insight into its potential to mislead litigants, barristers, and judges at the expense of justice. The implementation of AI into critical areas of the judicial process, capable of altering lives and precedents alike, must always be accompanied by human oversight — even in a century’s time.

Enjoying The Critic online? It's even better in print

Try five issues of Britain’s most civilised magazine for £10

Subscribe
Critic magazine cover