Robustness of large language models in moral judgements.

IF 2.9 3区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Royal Society Open Science Pub Date : 2025-04-23 eCollection Date: 2025-04-01 DOI:10.1098/rsos.241229

Soyoung Oh, Vera Demberg

引用次数: 0

Abstract

With the advent of large language models (LLMs), there has been a growing interest in analysing the preferences encoded in LLMs in the context of morality. Recent work has tested LLMs on various moral judgement tasks and drawn conclusions regarding the alignment between LLMs and humans. The present contribution critically assesses the validity of the method and results employed in previous work for eliciting moral judgements from LLMs. We find that previous results are confounded by biases in the presentation of the options in moral judgement tasks and that LLM responses are highly sensitive to prompt formulation variants as simple as changing 'Case 1' and 'Case 2' to '(A)' and '(B)'. Our results hence indicate that previous conclusions on moral judgements of LLMs cannot be upheld. We make recommendations for more sound methodological setups for future studies.

查看原文本刊更多论文

大型语言模型在道德判断中的鲁棒性。

随着大型语言模型（llm）的出现，人们对在道德背景下分析llm中编码的偏好越来越感兴趣。最近的工作在各种道德判断任务中测试了法学硕士，并得出了关于法学硕士与人类之间一致性的结论。目前的贡献批判性地评估了以前的工作中用于从法学硕士中引出道德判断的方法和结果的有效性。我们发现之前的结果被道德判断任务中选项呈现的偏差所混淆，法学硕士的反应对提示的表述变体非常敏感，比如将“情况1”和“情况2”更改为“(A)”和“(B)”。因此，我们的研究结果表明，之前关于法学硕士道德判断的结论是不成立的。我们为未来的研究提出了更合理的方法设置建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Royal Society Open Science Multidisciplinary-Multidisciplinary

CiteScore

6.00

自引率

0.00%

发文量

508

审稿时长

14 weeks

期刊介绍： Royal Society Open Science is a new open journal publishing high-quality original research across the entire range of science on the basis of objective peer-review. The journal covers the entire range of science and mathematics and will allow the Society to publish all the high-quality work it receives without the usual restrictions on scope, length or impact.