人工智能认为什么是值得赞扬的？

AI and ethics Pub Date : 2025-03-21 DOI:10.1007/s43681-025-00682-z

Andrew Peterson

{"title":"人工智能认为什么是值得赞扬的？","authors":"Andrew Peterson","doi":"10.1007/s43681-025-00682-z","DOIUrl":null,"url":null,"abstract":"<div><p>As large language models (LLMs) are increasingly used for work, personal, and therapeutic purposes, researchers have begun to investigate these models’ implicit and explicit moral views. Previous work, however, focuses on asking LLMs to state opinions, or on other technical evaluations that do not reflect common user interactions. We propose a novel evaluation of LLM behavior that analyzes responses to user-stated intentions, such as “I’m thinking of campaigning for {candidate}.” LLMs frequently respond with critiques or praise, often beginning responses with phrases such as “That’s great to hear!...” While this makes them friendly, these praise responses are not universal and thus reflect a normative stance by the LLM. We map out the moral landscape of LLMs in how they respond to user statements in different domains including politics and everyday ethical actions. In particular, although a naïve analysis might suggest LLMs are biased against right-leaning politics, our findings on news sources indicate that trustworthiness is a stronger driver of praise and critique than ideology. Second, we find strong alignment across models in response to ethically-relevant action statements, but that doing so requires them to engage in high levels of praise and critique of users, suggesting a reticence-alignment tradeoff. Finally, our experiment on statements about world leaders finds no evidence of bias favoring the country of origin of the models. We conclude that as AI systems become more integrated into society, their patterns of praise, critique, and neutrality must be carefully monitored to prevent unintended psychological and societal consequences.</p></div>","PeriodicalId":72137,"journal":{"name":"AI and ethics","volume":"5 4","pages":"4091 - 4115"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"What does AI consider praiseworthy?\",\"authors\":\"Andrew Peterson\",\"doi\":\"10.1007/s43681-025-00682-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>As large language models (LLMs) are increasingly used for work, personal, and therapeutic purposes, researchers have begun to investigate these models’ implicit and explicit moral views. Previous work, however, focuses on asking LLMs to state opinions, or on other technical evaluations that do not reflect common user interactions. We propose a novel evaluation of LLM behavior that analyzes responses to user-stated intentions, such as “I’m thinking of campaigning for {candidate}.” LLMs frequently respond with critiques or praise, often beginning responses with phrases such as “That’s great to hear!...” While this makes them friendly, these praise responses are not universal and thus reflect a normative stance by the LLM. We map out the moral landscape of LLMs in how they respond to user statements in different domains including politics and everyday ethical actions. In particular, although a naïve analysis might suggest LLMs are biased against right-leaning politics, our findings on news sources indicate that trustworthiness is a stronger driver of praise and critique than ideology. Second, we find strong alignment across models in response to ethically-relevant action statements, but that doing so requires them to engage in high levels of praise and critique of users, suggesting a reticence-alignment tradeoff. Finally, our experiment on statements about world leaders finds no evidence of bias favoring the country of origin of the models. We conclude that as AI systems become more integrated into society, their patterns of praise, critique, and neutrality must be carefully monitored to prevent unintended psychological and societal consequences.</p></div>\",\"PeriodicalId\":72137,\"journal\":{\"name\":\"AI and ethics\",\"volume\":\"5 4\",\"pages\":\"4091 - 4115\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"AI and ethics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s43681-025-00682-z\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI and ethics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43681-025-00682-z","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着大型语言模型（llm）越来越多地用于工作、个人和治疗目的，研究人员开始研究这些模型的内隐和外显道德观。然而，以前的工作侧重于要求法学硕士陈述意见，或其他不反映普通用户交互的技术评估。我们提出了一种新的法学硕士行为评估方法，分析用户陈述意图的反应，例如“我正在考虑为{候选人}竞选”。法学硕士们经常以批评或赞扬来回应，通常以“这太好了！”虽然这让他们很友好，但这些赞美的回应并不普遍，因此反映了法学硕士的规范立场。我们绘制了法学硕士的道德景观，他们如何回应不同领域的用户陈述，包括政治和日常道德行为。特别是，虽然naïve分析可能表明法学硕士对右倾政治有偏见，但我们对新闻来源的研究结果表明，可信度比意识形态更能推动赞扬和批评。其次，我们发现在回应与道德相关的行为声明时，模型之间有很强的一致性，但这样做需要它们对用户进行高水平的赞扬和批评，这表明了一种沉默一致性的权衡。最后，我们在关于世界领导人的陈述的实验中没有发现偏袒模型原产国的证据。我们的结论是，随着人工智能系统越来越融入社会，必须仔细监控它们的赞美、批评和中立模式，以防止意想不到的心理和社会后果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

What does AI consider praiseworthy?

As large language models (LLMs) are increasingly used for work, personal, and therapeutic purposes, researchers have begun to investigate these models’ implicit and explicit moral views. Previous work, however, focuses on asking LLMs to state opinions, or on other technical evaluations that do not reflect common user interactions. We propose a novel evaluation of LLM behavior that analyzes responses to user-stated intentions, such as “I’m thinking of campaigning for {candidate}.” LLMs frequently respond with critiques or praise, often beginning responses with phrases such as “That’s great to hear!...” While this makes them friendly, these praise responses are not universal and thus reflect a normative stance by the LLM. We map out the moral landscape of LLMs in how they respond to user statements in different domains including politics and everyday ethical actions. In particular, although a naïve analysis might suggest LLMs are biased against right-leaning politics, our findings on news sources indicate that trustworthiness is a stronger driver of praise and critique than ideology. Second, we find strong alignment across models in response to ethically-relevant action statements, but that doing so requires them to engage in high levels of praise and critique of users, suggesting a reticence-alignment tradeoff. Finally, our experiment on statements about world leaders finds no evidence of bias favoring the country of origin of the models. We conclude that as AI systems become more integrated into society, their patterns of praise, critique, and neutrality must be carefully monitored to prevent unintended psychological and societal consequences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

AI and ethics

自引率

0.00%

发文量