Yongjie Sun,Angelo Zappalà,Eleonora Di Maso,Francesco Pompedda,Thomas J Nyman,Pekka Santtila
{"title":"大型语言模型(法学硕士)作为陪审员:评估法学硕士在法律环境中的潜力。","authors":"Yongjie Sun,Angelo Zappalà,Eleonora Di Maso,Francesco Pompedda,Thomas J Nyman,Pekka Santtila","doi":"10.1037/lhb0000620","DOIUrl":null,"url":null,"abstract":"OBJECTIVE\r\nWe explored the potential of large language models (LLMs) in legal decision making by replicating Fraser et al. (2023) mock jury experiment using LLMs (GPT-4o, Claude 3.5 Sonnet, and GPT-o1) as decision makers. We investigated LLMs' reactions to factors that influenced human jurors, including defendant race, social status, number of allegations, and reporting delay in sexual assault cases.\r\n\r\nHYPOTHESES\r\nWe hypothesized that LLMs would show higher consistency than humans, with no explicit but potential implicit biases. We also examined potential mediating factors (race-crime congruence, credibility, black sheep effect) and moderating effects (beliefs about traumatic memory, ease of reporting) explaining LLM decision making.\r\n\r\nMETHOD\r\nUsing a 2 × 2 × 2 × 3 factorial design, we manipulated defendant race (Black/White), social status (low/high), number of allegations (one/five), and reporting delay (5/20/35 years), collecting 2,304 responses across conditions. LLMs were prompted to act as jurors, providing probability of guilt assessments (0-100), dichotomous verdicts, and responses to mediator and moderator variables.\r\n\r\nRESULTS\r\nLLMs showed higher average probability of guilt assessments compared with humans (63.56 vs. 58.82) but were more conservative in rendering guilty verdicts (21% vs. 49%). Similar to humans, LLMs demonstrated bias against White defendants and increased guilt attributions with multiple allegations. Unlike humans, who showed minimal effects of reporting delay, LLMs assigned higher guilt probabilities to cases with shorter reporting delays. Mediation analyses revealed that race-crime stereotype congruency and the black sheep effect partially mediated the racial bias effect, whereas perceived memory strength mediated the reporting delay effect.\r\n\r\nCONCLUSIONS\r\nAlthough LLMs may offer more consistent decision making, they are not immune to biases and may interpret certain case factors differently from human jurors. (PsycInfo Database Record (c) 2025 APA, all rights reserved).","PeriodicalId":48230,"journal":{"name":"Law and Human Behavior","volume":"99 1","pages":""},"PeriodicalIF":3.2000,"publicationDate":"2025-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language models (LLMs) as jurors: Assessing the potential of LLMs in legal contexts.\",\"authors\":\"Yongjie Sun,Angelo Zappalà,Eleonora Di Maso,Francesco Pompedda,Thomas J Nyman,Pekka Santtila\",\"doi\":\"10.1037/lhb0000620\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"OBJECTIVE\\r\\nWe explored the potential of large language models (LLMs) in legal decision making by replicating Fraser et al. (2023) mock jury experiment using LLMs (GPT-4o, Claude 3.5 Sonnet, and GPT-o1) as decision makers. We investigated LLMs' reactions to factors that influenced human jurors, including defendant race, social status, number of allegations, and reporting delay in sexual assault cases.\\r\\n\\r\\nHYPOTHESES\\r\\nWe hypothesized that LLMs would show higher consistency than humans, with no explicit but potential implicit biases. We also examined potential mediating factors (race-crime congruence, credibility, black sheep effect) and moderating effects (beliefs about traumatic memory, ease of reporting) explaining LLM decision making.\\r\\n\\r\\nMETHOD\\r\\nUsing a 2 × 2 × 2 × 3 factorial design, we manipulated defendant race (Black/White), social status (low/high), number of allegations (one/five), and reporting delay (5/20/35 years), collecting 2,304 responses across conditions. LLMs were prompted to act as jurors, providing probability of guilt assessments (0-100), dichotomous verdicts, and responses to mediator and moderator variables.\\r\\n\\r\\nRESULTS\\r\\nLLMs showed higher average probability of guilt assessments compared with humans (63.56 vs. 58.82) but were more conservative in rendering guilty verdicts (21% vs. 49%). Similar to humans, LLMs demonstrated bias against White defendants and increased guilt attributions with multiple allegations. Unlike humans, who showed minimal effects of reporting delay, LLMs assigned higher guilt probabilities to cases with shorter reporting delays. Mediation analyses revealed that race-crime stereotype congruency and the black sheep effect partially mediated the racial bias effect, whereas perceived memory strength mediated the reporting delay effect.\\r\\n\\r\\nCONCLUSIONS\\r\\nAlthough LLMs may offer more consistent decision making, they are not immune to biases and may interpret certain case factors differently from human jurors. (PsycInfo Database Record (c) 2025 APA, all rights reserved).\",\"PeriodicalId\":48230,\"journal\":{\"name\":\"Law and Human Behavior\",\"volume\":\"99 1\",\"pages\":\"\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Law and Human Behavior\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1037/lhb0000620\",\"RegionNum\":2,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"LAW\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Law and Human Behavior","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1037/lhb0000620","RegionNum":2,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"LAW","Score":null,"Total":0}
Large language models (LLMs) as jurors: Assessing the potential of LLMs in legal contexts.
OBJECTIVE
We explored the potential of large language models (LLMs) in legal decision making by replicating Fraser et al. (2023) mock jury experiment using LLMs (GPT-4o, Claude 3.5 Sonnet, and GPT-o1) as decision makers. We investigated LLMs' reactions to factors that influenced human jurors, including defendant race, social status, number of allegations, and reporting delay in sexual assault cases.
HYPOTHESES
We hypothesized that LLMs would show higher consistency than humans, with no explicit but potential implicit biases. We also examined potential mediating factors (race-crime congruence, credibility, black sheep effect) and moderating effects (beliefs about traumatic memory, ease of reporting) explaining LLM decision making.
METHOD
Using a 2 × 2 × 2 × 3 factorial design, we manipulated defendant race (Black/White), social status (low/high), number of allegations (one/five), and reporting delay (5/20/35 years), collecting 2,304 responses across conditions. LLMs were prompted to act as jurors, providing probability of guilt assessments (0-100), dichotomous verdicts, and responses to mediator and moderator variables.
RESULTS
LLMs showed higher average probability of guilt assessments compared with humans (63.56 vs. 58.82) but were more conservative in rendering guilty verdicts (21% vs. 49%). Similar to humans, LLMs demonstrated bias against White defendants and increased guilt attributions with multiple allegations. Unlike humans, who showed minimal effects of reporting delay, LLMs assigned higher guilt probabilities to cases with shorter reporting delays. Mediation analyses revealed that race-crime stereotype congruency and the black sheep effect partially mediated the racial bias effect, whereas perceived memory strength mediated the reporting delay effect.
CONCLUSIONS
Although LLMs may offer more consistent decision making, they are not immune to biases and may interpret certain case factors differently from human jurors. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
期刊介绍:
Law and Human Behavior, the official journal of the American Psychology-Law Society/Division 41 of the American Psychological Association, is a multidisciplinary forum for the publication of articles and discussions of issues arising out of the relationships between human behavior and the law, our legal system, and the legal process. This journal publishes original research, reviews of past research, and theoretical studies from professionals in criminal justice, law, psychology, sociology, psychiatry, political science, education, communication, and other areas germane to the field.