{"title":"Aligning with ideal values: a proposal for anchoring AI in moral expertise","authors":"Erich Riesen, Mark Boespflug","doi":"10.1007/s43681-025-00664-1","DOIUrl":null,"url":null,"abstract":"<div><p>Autonomous AI agents are increasingly required to operate in contexts where human welfare is at stake, raising the imperative for them to act in ways that are morally optimal—or at least morally permissible. The value alignment research program seeks to create “beneficial AI” by aligning AI behavior with human values (Russell in Human compatible: artificial intelligence and the problem of control, Penguin, London, 2019). In this article, we propose a method for specifying permissible outcomes for AI agents that targets ideal values via moral expertise as embodied in the collective judgments of philosophical ethicists. We defend the notion that ethicists are moral experts against several objections found in the recent literature and argue that their aggregated judgments offer the epistemically best available proxy for moral truth. We recommend a systematic study of ethicists’ judgments—using tools from social psychology and social choice theory—to guide AI agents' behavior in morally complex situations.</p></div>","PeriodicalId":72137,"journal":{"name":"AI and ethics","volume":"5 4","pages":"3727 - 3741"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI and ethics","FirstCategoryId":"1085","ListUrlMain":"https://link.springer.com/article/10.1007/s43681-025-00664-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Autonomous AI agents are increasingly required to operate in contexts where human welfare is at stake, raising the imperative for them to act in ways that are morally optimal—or at least morally permissible. The value alignment research program seeks to create “beneficial AI” by aligning AI behavior with human values (Russell in Human compatible: artificial intelligence and the problem of control, Penguin, London, 2019). In this article, we propose a method for specifying permissible outcomes for AI agents that targets ideal values via moral expertise as embodied in the collective judgments of philosophical ethicists. We defend the notion that ethicists are moral experts against several objections found in the recent literature and argue that their aggregated judgments offer the epistemically best available proxy for moral truth. We recommend a systematic study of ethicists’ judgments—using tools from social psychology and social choice theory—to guide AI agents' behavior in morally complex situations.
越来越多的人要求自主的人工智能代理在人类福利受到威胁的情况下运作,这就要求它们以道德上最优的方式行事,或者至少在道德上是允许的。价值一致性研究计划旨在通过使人工智能行为与人类价值观保持一致来创造“有益的人工智能”(Russell在human compatible: artificial intelligence and problem of control, Penguin, London, 2019)。在本文中,我们提出了一种方法,通过哲学伦理学家的集体判断中体现的道德专业知识,为以理想价值观为目标的人工智能代理指定允许的结果。我们捍卫伦理学家是道德专家的概念,反对最近文献中发现的一些反对意见,并认为他们的综合判断提供了道德真理的最佳认识论代理。我们建议对伦理学家的判断进行系统的研究——使用社会心理学和社会选择理论的工具——以指导人工智能代理在道德复杂情况下的行为。