广义相关均衡的简单解耦无遗憾学习动力学

IF 2.3 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Journal of the ACM Pub Date : 2022-11-18 DOI:https://dl.acm.org/doi/10.1145/3563772

Gabriele Farina, Andrea Celli, Alberto Marchesi, Nicola Gatti

{"title":"广义相关均衡的简单解耦无遗憾学习动力学","authors":"Gabriele Farina, Andrea Celli, Alberto Marchesi, Nicola Gatti","doi":"https://dl.acm.org/doi/10.1145/3563772","DOIUrl":null,"url":null,"abstract":"The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device (a.k.a. mediator) must take into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play after T game repetitions is proven to be a \\( O(1/\\sqrt {T}) \\)-approximate EFCE with high probability, and an EFCE almost surely in the limit.","PeriodicalId":50022,"journal":{"name":"Journal of the ACM","volume":"22 5","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2022-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium\",\"authors\":\"Gabriele Farina, Andrea Celli, Alberto Marchesi, Nicola Gatti\",\"doi\":\"https://dl.acm.org/doi/10.1145/3563772\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device (a.k.a. mediator) must take into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play after T game repetitions is proven to be a \\\\( O(1/\\\\sqrt {T}) \\\\)-approximate EFCE with high probability, and an EFCE almost surely in the limit.\",\"PeriodicalId\":50022,\"journal\":{\"name\":\"Journal of the ACM\",\"volume\":\"22 5\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2022-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the ACM\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/https://dl.acm.org/doi/10.1145/3563772\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the ACM","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3563772","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

在多智能体系统理论中，存在着一种简单的解耦无后悔学习动态，它收敛于规范化博弈中的相关均衡。具体来说，20多年来我们已经知道，当所有玩家在重复的正常形式游戏中寻求最小化他们的内在遗憾时，游戏的经验频率收敛于正常形式相关均衡。扩展形式(即树形)游戏通过对顺序和同时移动以及不完全信息进行建模来推广正常形式游戏。由于博弈中的序列性和私有信息的存在，广义博弈中的相关性具有与正规博弈明显不同的性质，其中许多仍是开放的研究方向。广义相关均衡(EFCE)被认为是范式博弈中经典相关均衡概念的自然广义对应。与后者相比，定义efce集合的约束条件要复杂得多，因为相关设备(又名中介)必须考虑到每个玩家在整个游戏过程中观察到的信念演变。由于这种显著增加的复杂性，导致EFCE的解耦学习动力学的存在一直是一个具有挑战性的开放研究问题。在这篇文章中，我们通过给出第一个解耦无遗憾动态来解决这个问题，该动态收敛于具有完美回忆的n人一般和广泛形式博弈中的efce集合。我们证明了每次迭代都可以用游戏树大小的时间多项式来计算，并且，当所有玩家根据我们的学习动态重复游戏时，T次游戏重复后的经验游戏频率被证明是高概率的\( O(1/\sqrt {T}) \) -近似EFCE，并且几乎肯定是极限的EFCE。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Simple Uncoupled No-regret Learning Dynamics for Extensive-form Correlated Equilibrium

The existence of simple uncoupled no-regret learning dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensive-form games possesses significantly different properties than in normal-form games, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to the classical notion of correlated equilibrium in normal-form games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device (a.k.a. mediator) must take into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a long time. In this article, we settle that question by giving the first uncoupled no-regret dynamics that converge to the set of EFCEs in n-player general-sum extensive-form games with perfect recall. We show that each iterate can be computed in time polynomial in the size of the game tree, and that, when all players play repeatedly according to our learning dynamics, the empirical frequency of play after T game repetitions is proven to be a \( O(1/\sqrt {T}) \)-approximate EFCE with high probability, and an EFCE almost surely in the limit.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the ACM 工程技术-计算机：理论方法

CiteScore

7.50

自引率

0.00%

发文量

审稿时长

3 months

期刊介绍： The best indicator of the scope of the journal is provided by the areas covered by its Editorial Board. These areas change from time to time, as the field evolves. The following areas are currently covered by a member of the Editorial Board: Algorithms and Combinatorial Optimization; Algorithms and Data Structures; Algorithms, Combinatorial Optimization, and Games; Artificial Intelligence; Complexity Theory; Computational Biology; Computational Geometry; Computer Graphics and Computer Vision; Computer-Aided Verification; Cryptography and Security; Cyber-Physical, Embedded, and Real-Time Systems; Database Systems and Theory; Distributed Computing; Economics and Computation; Information Theory; Logic and Computation; Logic, Algorithms, and Complexity; Machine Learning and Computational Learning Theory; Networking; Parallel Computing and Architecture; Programming Languages; Quantum Computing; Randomized Algorithms and Probabilistic Analysis of Algorithms; Scientific Computing and High Performance Computing; Software Engineering; Web Algorithms and Data Mining