Large language models (LLMs) as jurors: Assessing the potential of LLMs in legal contexts.

IF 3.2 2区社会学 Q1 LAW

Law and Human Behavior Pub Date : 2025-10-20 DOI:10.1037/lhb0000620

Yongjie Sun,Angelo Zappalà,Eleonora Di Maso,Francesco Pompedda,Thomas J Nyman,Pekka Santtila

{"title":"Large language models (LLMs) as jurors: Assessing the potential of LLMs in legal contexts.","authors":"Yongjie Sun,Angelo Zappalà,Eleonora Di Maso,Francesco Pompedda,Thomas J Nyman,Pekka Santtila","doi":"10.1037/lhb0000620","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\r\nWe explored the potential of large language models (LLMs) in legal decision making by replicating Fraser et al. (2023) mock jury experiment using LLMs (GPT-4o, Claude 3.5 Sonnet, and GPT-o1) as decision makers. We investigated LLMs' reactions to factors that influenced human jurors, including defendant race, social status, number of allegations, and reporting delay in sexual assault cases.\r\n\r\nHYPOTHESES\r\nWe hypothesized that LLMs would show higher consistency than humans, with no explicit but potential implicit biases. We also examined potential mediating factors (race-crime congruence, credibility, black sheep effect) and moderating effects (beliefs about traumatic memory, ease of reporting) explaining LLM decision making.\r\n\r\nMETHOD\r\nUsing a 2 × 2 × 2 × 3 factorial design, we manipulated defendant race (Black/White), social status (low/high), number of allegations (one/five), and reporting delay (5/20/35 years), collecting 2,304 responses across conditions. LLMs were prompted to act as jurors, providing probability of guilt assessments (0-100), dichotomous verdicts, and responses to mediator and moderator variables.\r\n\r\nRESULTS\r\nLLMs showed higher average probability of guilt assessments compared with humans (63.56 vs. 58.82) but were more conservative in rendering guilty verdicts (21% vs. 49%). Similar to humans, LLMs demonstrated bias against White defendants and increased guilt attributions with multiple allegations. Unlike humans, who showed minimal effects of reporting delay, LLMs assigned higher guilt probabilities to cases with shorter reporting delays. Mediation analyses revealed that race-crime stereotype congruency and the black sheep effect partially mediated the racial bias effect, whereas perceived memory strength mediated the reporting delay effect.\r\n\r\nCONCLUSIONS\r\nAlthough LLMs may offer more consistent decision making, they are not immune to biases and may interpret certain case factors differently from human jurors. (PsycInfo Database Record (c) 2025 APA, all rights reserved).","PeriodicalId":48230,"journal":{"name":"Law and Human Behavior","volume":"99 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Law and Human Behavior","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1037/lhb0000620","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LAW","Score":null,"Total":0}

引用次数: 0

Abstract

OBJECTIVE We explored the potential of large language models (LLMs) in legal decision making by replicating Fraser et al. (2023) mock jury experiment using LLMs (GPT-4o, Claude 3.5 Sonnet, and GPT-o1) as decision makers. We investigated LLMs' reactions to factors that influenced human jurors, including defendant race, social status, number of allegations, and reporting delay in sexual assault cases. HYPOTHESES We hypothesized that LLMs would show higher consistency than humans, with no explicit but potential implicit biases. We also examined potential mediating factors (race-crime congruence, credibility, black sheep effect) and moderating effects (beliefs about traumatic memory, ease of reporting) explaining LLM decision making. METHOD Using a 2 × 2 × 2 × 3 factorial design, we manipulated defendant race (Black/White), social status (low/high), number of allegations (one/five), and reporting delay (5/20/35 years), collecting 2,304 responses across conditions. LLMs were prompted to act as jurors, providing probability of guilt assessments (0-100), dichotomous verdicts, and responses to mediator and moderator variables. RESULTS LLMs showed higher average probability of guilt assessments compared with humans (63.56 vs. 58.82) but were more conservative in rendering guilty verdicts (21% vs. 49%). Similar to humans, LLMs demonstrated bias against White defendants and increased guilt attributions with multiple allegations. Unlike humans, who showed minimal effects of reporting delay, LLMs assigned higher guilt probabilities to cases with shorter reporting delays. Mediation analyses revealed that race-crime stereotype congruency and the black sheep effect partially mediated the racial bias effect, whereas perceived memory strength mediated the reporting delay effect. CONCLUSIONS Although LLMs may offer more consistent decision making, they are not immune to biases and may interpret certain case factors differently from human jurors. (PsycInfo Database Record (c) 2025 APA, all rights reserved).

查看原文本刊更多论文

大型语言模型（法学硕士）作为陪审员：评估法学硕士在法律环境中的潜力。

目的：我们通过复制Fraser等人（2023）的模拟陪审团实验，探索大型语言模型（LLMs）在法律决策中的潜力。模拟陪审团实验使用LLMs （gpt - 40、Claude 3.5 Sonnet和gpt - 01）作为决策者。我们调查了法学硕士对影响人类陪审员的因素的反应，包括被告种族、社会地位、指控数量和性侵犯案件的报告延迟。我们假设法学硕士会比人类表现出更高的一致性，没有明确的但潜在的隐性偏见。我们还研究了解释法学硕士决策的潜在中介因素（种族犯罪一致性、可信度、害羊效应）和调节效应（关于创伤记忆的信念、报告的便便性）。方法采用2 × 2 × 2 × 3因子设计，对被告种族（黑人/白人）、社会地位（低/高）、指控数量（1 / 5）和报告延迟（5/20/35年）进行操纵，收集了不同条件下的2304份回复。法学硕士被要求充当陪审员，提供有罪评估的概率（0-100）、二分判决，以及对中介变量和调节变量的反应。结果与人类相比，sllm具有更高的平均有罪评估概率（63.56比58.82），但在做出有罪判决时更为保守（21%比49%）。与人类相似，法学硕士表现出对白人被告的偏见，并在多次指控中增加了罪责归因。与人类不同，人类对报告延迟的影响最小，法学硕士对报告延迟较短的案件分配了更高的犯罪概率。中介分析发现，种族犯罪刻板印象一致性和害群之马效应在种族偏见效应中起部分中介作用，而知觉记忆强度在报告延迟效应中起部分中介作用。尽管法学硕士可能提供更一致的决策，但他们也不能避免偏见，并且可能对某些案件因素的解释与人类陪审员不同。（PsycInfo Database Record (c) 2025 APA，版权所有）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Law and Human Behavior Multiple-

CiteScore

4.50

自引率

8.00%

发文量

期刊介绍： Law and Human Behavior, the official journal of the American Psychology-Law Society/Division 41 of the American Psychological Association, is a multidisciplinary forum for the publication of articles and discussions of issues arising out of the relationships between human behavior and the law, our legal system, and the legal process. This journal publishes original research, reviews of past research, and theoretical studies from professionals in criminal justice, law, psychology, sociology, psychiatry, political science, education, communication, and other areas germane to the field.