激励响应，工具控制和影响

IF 4.6 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Artificial Intelligence Pub Date : 2025-09-02 DOI:10.1016/j.artint.2025.104408

Ryan Carey , Eric Langlois , Chris van Merwijk , Shane Legg , Tom Everitt

{"title":"激励响应，工具控制和影响","authors":"Ryan Carey , Eric Langlois , Chris van Merwijk , Shane Legg , Tom Everitt","doi":"10.1016/j.artint.2025.104408","DOIUrl":null,"url":null,"abstract":"<div><div>We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings.</div><div>This journal paper extends our conference publication “Agent Incentives: A Causal Perspective”: the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.</div></div>","PeriodicalId":8434,"journal":{"name":"Artificial Intelligence","volume":"348 ","pages":"Article 104408"},"PeriodicalIF":4.6000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incentives for responsiveness, instrumental control and impact\",\"authors\":\"Ryan Carey , Eric Langlois , Chris van Merwijk , Shane Legg , Tom Everitt\",\"doi\":\"10.1016/j.artint.2025.104408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings.</div><div>This journal paper extends our conference publication “Agent Incentives: A Causal Perspective”: the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.</div></div>\",\"PeriodicalId\":8434,\"journal\":{\"name\":\"Artificial Intelligence\",\"volume\":\"348 \",\"pages\":\"Article 104408\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-09-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0004370225001274\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial Intelligence","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0004370225001274","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

我们引入了描述agent激励的三个概念：响应激励表明环境中的哪些变量，如敏感的人口统计信息，会影响最优策略下的决策；工具控制激励指示代理是否选择策略来操纵其环境的一部分，例如用户的偏好或指令。影响激励表明代理人有意或无意地影响哪些变量。对于每个概念，我们建立了健全和完整的图形标准，并讨论了可用于产生安全和公平代理行为激励的一般技术类别。最后，我们概述了如何将这些概念推广到多决策设置。这篇期刊论文扩展了我们的会议出版物“Agent Incentives: A Causal Perspective”：更新了关于响应激励和工具控制激励的材料，而关于影响激励和多决策设置的工作则是全新的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incentives for responsiveness, instrumental control and impact

We introduce three concepts that describe an agent's incentives: response incentives indicate which variables in the environment, such as sensitive demographic information, affect the decision under the optimal policy. Instrumental control incentives indicate whether an agent's policy is chosen to manipulate part of its environment, such as the preferences or instructions of a user. Impact incentives indicate which variables an agent will affect, intentionally or otherwise. For each concept, we establish sound and complete graphical criteria, and discuss general classes of techniques that may be used to produce incentives for safe and fair agent behaviour. Finally, we outline how these notions may be generalised to multi-decision settings.

This journal paper extends our conference publication “Agent Incentives: A Causal Perspective”: the material on response incentives and instrumental control incentives is updated, while the work on impact incentives and multi-decision settings is entirely new.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Artificial Intelligence 工程技术-计算机：人工智能

CiteScore

11.20

自引率

1.40%

发文量

118

审稿时长

8 months

期刊介绍： The Journal of Artificial Intelligence (AIJ) welcomes papers covering a broad spectrum of AI topics, including cognition, automated reasoning, computer vision, machine learning, and more. Papers should demonstrate advancements in AI and propose innovative approaches to AI problems. Additionally, the journal accepts papers describing AI applications, focusing on how new methods enhance performance rather than reiterating conventional approaches. In addition to regular papers, AIJ also accepts Research Notes, Research Field Reviews, Position Papers, Book Reviews, and summary papers on AI challenges and competitions.