group photo

A real judge convicted him. A real appeals court upheld it. Then three artificial intelligence systems reviewed the same case and said: not guilty. 

The mock trial, held Oct. 24, 2025, at UNC School of Law as part of the university’s Converge-Con AI Festival, put ChatGPT, Claude, and Grok in the role of jurors for a fictional robbery case modeled on a real juvenile matter from North Carolina. In the actual case, the judge convicted quickly. The AI jury acquitted unanimously — and that disparity is exactly what professor Joseph Kennedy had in mind. 

Kennedy, the Willie Person Mangum Distinguished Professor of Law, designed the simulation around a case he defended through Carolina Law’s Juvenile Justice Clinic. The fictional defendant, Henry Justus, is a 17-year-old Black student accused of robbery at a school where Black students made up just 10% of the population. Prosecutors argued that his physical presence and positioning during the incident constituted criminal assistance. The defense countered that presence alone cannot prove shared intent beyond reasonable doubt. 

Second-year law students argued both sides live, with testimony converted to text and fed to the AI models in real time. After multiple rounds of deliberation, all three systems landed in the same place: the prosecution had not proved that Justus shared the criminal intent of his co-defendant. 

Kennedy did not hide his ambivalence about what the experiment revealed. “I am not sure if I created a cautionary tale about a possible dystopian future, or a roadmap to it,” he said from the bench after the verdict. 

The post-trial panel sharpened the tension. Professor Eisha Jain, Henry P. Brandis Distinguished Professor of Law, called the concept “absolutely dystopian” and pushed back on the framing that cleaner legal reasoning equals better justice. Real jurors bring life experience to deliberations in ways AI cannot replicate. They can also choose jury nullification, declining to convict even when evidence supports guilt, as a democratic check on government power. Jain doubts AI systems would ever ask the more fundamental question: why is this a criminal case at all? 

Matthew Kotzen, chair of UNC’s philosophy department, raised a deeper problem. Large language models predict the next word in a sequence, and whether that process constitutes anything like genuine belief or understanding remains, in his words, “dramatically unsettled.” Applying a standard like reasonable doubt requires a system that actually represents the world, and it is far from clear that these models do. 

Kennedy framed the experiment around four values the criminal justice system tries to serve: accuracy, freedom from bias, efficiency, and legitimacy. How AI performs against each of those remains an open question. Watch the full mock trial below.