Ryan Carey , Eric Langlois , Chris van Merwijk , Shane Legg , Tom Everitt
{"title":"激励响应,工具控制和影响","authors":"Ryan Carey , Eric Langlois , Chris van Merwijk , Shane Legg , Tom Everitt","doi":"10.1016/j.artint.2025.104408","DOIUrl":null,"url":null,"abstract":"<div><div>We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings.</div><div>This journal paper extends our conference publication “Agent Incentives: A Causal Perspective”: the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"348 ","pages":"Article 104408"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incentives for responsiveness, instrumental control and impact\",\"authors\":\"Ryan Carey , Eric Langlois , Chris van Merwijk , Shane Legg , Tom Everitt\",\"doi\":\"10.1016/j.artint.2025.104408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings.</div><div>This journal paper extends our conference publication “Agent Incentives: A Causal Perspective”: the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.</div></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"348 \",\"pages\":\"Article 104408\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370225001274\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225001274","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
我们引入了描述agent激励的三个概念:响应激励表明环境中的哪些变量,如敏感的人口统计信息,会影响最优策略下的决策;工具控制激励指示代理是否选择策略来操纵其环境的一部分,例如用户的偏好或指令。影响激励表明代理人有意或无意地影响哪些变量。对于每个概念,我们建立了健全和完整的图形标准,并讨论了可用于产生安全和公平代理行为激励的一般技术类别。最后,我们概述了如何将这些概念推广到多决策设置。这篇期刊论文扩展了我们的会议出版物“Agent Incentives: A Causal Perspective”:更新了关于响应激励和工具控制激励的材料,而关于影响激励和多决策设置的工作则是全新的。
Incentives for responsiveness, instrumental control and impact
We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings.
This journal paper extends our conference publication “Agent Incentives: A Causal Perspective”: the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.
期刊介绍:
The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.