Unilateral incentive alignment in two-agent stochastic games

IF 9.1 1区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Proceedings of the National Academy of Sciences of the United States of America Pub Date : 2025-06-16 DOI:10.1073/pnas.2319927121

Alex McAvoy, Udari Madhushani Sehwag, Christian Hilbe, Krishnendu Chatterjee, Wolfram Barfuss, Qi Su, Naomi Ehrich Leonard, Joshua B. Plotkin

{"title":"Unilateral incentive alignment in two-agent stochastic games","authors":"Alex McAvoy, Udari Madhushani Sehwag, Christian Hilbe, Krishnendu Chatterjee, Wolfram Barfuss, Qi Su, Naomi Ehrich Leonard, Joshua B. Plotkin","doi":"10.1073/pnas.2319927121","DOIUrl":null,"url":null,"abstract":"Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of “zero-determinant” strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent (“enforcer”) can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent’s viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.","PeriodicalId":20548,"journal":{"name":"Proceedings of the National Academy of Sciences of the United States of America","volume":"92 1","pages":""},"PeriodicalIF":9.1000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the National Academy of Sciences of the United States of America","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1073/pnas.2319927121","RegionNum":1,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

Multiagent learning is challenging when agents face mixed-motivation interactions, where conflicts of interest arise as agents independently try to optimize their respective outcomes. Recent advancements in evolutionary game theory have identified a class of “zero-determinant” strategies, which confer an agent with significant unilateral control over outcomes in repeated games. Building on these insights, we present a comprehensive generalization of zero-determinant strategies to stochastic games, encompassing dynamic environments. We propose an algorithm that allows an agent to discover strategies enforcing predetermined linear (or approximately linear) payoff relationships. Of particular interest is the relationship in which both payoffs are equal, which serves as a proxy for fairness in symmetric games. We demonstrate that an agent can discover strategies enforcing such relationships through experience alone, without coordinating with an opponent. In finding and using such a strategy, an agent (“enforcer”) can incentivize optimal and equitable outcomes, circumventing potential exploitation. In particular, from the opponent’s viewpoint, the enforcer transforms a mixed-motivation problem into a cooperative problem, paving the way for more collaboration and fairness in multiagent systems.

查看原文本刊更多论文

双智能体随机博弈中的单边激励对齐

当智能体面临混合动机交互时，多智能体学习是具有挑战性的，当智能体独立尝试优化各自的结果时，会产生利益冲突。进化博弈论的最新进展已经确定了一类“零决定”策略，它赋予代理人对重复博弈结果的显著单方面控制。在这些见解的基础上，我们提出了零决定策略对随机博弈的全面概括，包括动态环境。我们提出了一种算法，允许代理发现执行预定线性（或近似线性）收益关系的策略。我们特别感兴趣的是两种收益相等的关系，这是对称游戏中公平性的代表。我们证明了智能体可以通过经验发现执行这种关系的策略，而不需要与对手协调。在发现和使用这种策略时，代理人（“执行者”）可以激励最优和公平的结果，避免潜在的剥削。特别是，从对手的角度来看，执行者将混合动机问题转化为合作问题，为多智能体系统中更多的协作和公平铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the National Academy of Sciences of the United States of America 综合性期刊-综合性期刊

CiteScore

19.00

自引率

0.90%

发文量

3575

审稿时长

2.5 months

期刊介绍： The Proceedings of the National Academy of Sciences (PNAS), a peer-reviewed journal of the National Academy of Sciences (NAS), serves as an authoritative source for high-impact, original research across the biological, physical, and social sciences. With a global scope, the journal welcomes submissions from researchers worldwide, making it an inclusive platform for advancing scientific knowledge.