双智能体随机博弈中的单边激励对齐

IF 9.1 1区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES
Alex McAvoy, Udari Madhushani Sehwag, Christian Hilbe, Krishnendu Chatterjee, Wolfram Barfuss, Qi Su, Naomi Ehrich Leonard, Joshua B. Plotkin
{"title":"双智能体随机博弈中的单边激励对齐","authors":"Alex McAvoy, Udari Madhushani Sehwag, Christian Hilbe, Krishnendu Chatterjee, Wolfram Barfuss, Qi Su, Naomi Ehrich Leonard, Joshua B. Plotkin","doi":"10.1073/pnas.2319927121","DOIUrl":null,"url":null,"abstract":"Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of “zero-determinant” strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent (“enforcer”) can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent’s viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"92 1","pages":""},"PeriodicalIF":9.1000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unilateral incentive alignment in two-agent stochastic games\",\"authors\":\"Alex McAvoy, Udari Madhushani Sehwag, Christian Hilbe, Krishnendu Chatterjee, Wolfram Barfuss, Qi Su, Naomi Ehrich Leonard, Joshua B. Plotkin\",\"doi\":\"10.1073/pnas.2319927121\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of “zero-determinant” strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent (“enforcer”) can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent’s viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.\",\"PeriodicalId\":20548,\"journal\":{\"name\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"volume\":\"92 1\",\"pages\":\"\"},\"PeriodicalIF\":9.1000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the National Academy of Sciences of the United States of America\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://doi.org/10.1073/pnas.2319927121\",\"RegionNum\":1,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2319927121","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

摘要

当智能体面临混合动机交互时,多智能体学习是具有挑战性的,当智能体独立尝试优化各自的结果时,会产生利益冲突。进化博弈论的最新进展已经确定了一类“零决定”策略,它赋予代理人对重复博弈结果的显著单方面控制。在这些见解的基础上,我们提出了零决定策略对随机博弈的全面概括,包括动态环境。我们提出了一种算法,允许代理发现执行预定线性(或近似线性)收益关系的策略。我们特别感兴趣的是两种收益相等的关系,这是对称游戏中公平性的代表。我们证明了智能体可以通过经验发现执行这种关系的策略,而不需要与对手协调。在发现和使用这种策略时,代理人(“执行者”)可以激励最优和公平的结果,避免潜在的剥削。特别是,从对手的角度来看,执行者将混合动机问题转化为合作问题,为多智能体系统中更多的协作和公平铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Unilateral incentive alignment in two-agent stochastic games
Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of “zero-determinant” strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent (“enforcer”) can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent’s viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
19.00
自引率
0.90%
发文量
3575
审稿时长
2.5 months
期刊介绍: The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信