Evolutionary Dynamics and Phi-Regret Minimization in Games

IF 4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Artificial Intelligence Research Pub Date : 2022-07-08 DOI:10.1613/jair.1.13187

G. Piliouras, Mark Rowland, Shayegan Omidshafiei, R. Élie, Daniel Hennes, Jerome T. Connor, K. Tuyls

{"title":"Evolutionary Dynamics and Phi-Regret Minimization in Games","authors":"G. Piliouras, Mark Rowland, Shayegan Omidshafiei, R. Élie, Daniel Hennes, Jerome T. Connor, K. Tuyls","doi":"10.1613/jair.1.13187","DOIUrl":null,"url":null,"abstract":"Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner’s performance against a baseline in hindsight. It is well known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full mixed strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established Φ-regret framework, which provides a continuum of stronger regret measures. Importantly, Φ-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of Φ-regret in generic 2 × 2 games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 2 × 2 games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of Φ-regret minimization by RD in some larger games, hinting at further opportunity for Φ-regret based study of such algorithms from both a theoretical and empirical perspective.","PeriodicalId":54877,"journal":{"name":"Journal of Artificial Intelligence Research","volume":"99 1","pages":"1125-1158"},"PeriodicalIF":4.0000,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Artificial Intelligence Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1613/jair.1.13187","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 1

Abstract

Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games. Regret quantifies the difference between a learner’s performance against a baseline in hindsight. It is well known that regret-minimizing algorithms converge to certain classes of equilibria in games; however, traditional forms of regret used in game theory predominantly consider baselines that permit deviations to deterministic actions or strategies. In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full mixed strategy space (i.e., probability distributions over pure strategies), under the lens of the previously-established Φ-regret framework, which provides a continuum of stronger regret measures. Importantly, Φ-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms. We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of Φ-regret in generic 2 × 2 games, without any modification of the underlying algorithm itself. We subsequently conduct experiments validating our theoretical results in a suite of 144 2 × 2 games wherein RD exhibits a diverse set of behaviors. We conclude by providing empirical evidence of Φ-regret minimization by RD in some larger games, hinting at further opportunity for Φ-regret based study of such algorithms from both a theoretical and empirical perspective.

查看原文本刊更多论文

游戏中的进化动力学和phi -遗憾最小化

遗憾已经成为在线学习的一个基本概念，同样在游戏学习动态分析中也有重要的应用。后悔量化了学习者的表现与后见之明的基线之间的差异。众所周知，遗憾最小化算法收敛于博弈中的某些均衡类;然而，博弈论中使用的传统后悔形式主要考虑允许确定性行动或策略偏差的基线。在本文中，我们在先前建立的Φ-regret框架的透镜下，从完全混合策略空间分区的偏差角度(即纯策略的概率分布)重新审视了我们对后悔的理解，该框架提供了一个连续的更强的后悔措施。重要的是，Φ-regret使学习代理能够考虑与混合策略的偏差，概括了几种现有的后悔概念，如外部、内部和交换后悔，从而拓宽了从基于后悔的学习算法分析中获得的见解。我们在此证明，经过充分研究的复制动力学进化学习算法(RD)无缝地最小化了一般2 × 2游戏中Φ-regret的最强可能形式，而无需对底层算法本身进行任何修改。随后，我们在144个2x2游戏中进行了实验，验证了我们的理论结果，其中RD表现出了一系列不同的行为。最后，我们提供了一些大型游戏中研发Φ-regret最小化的经验证据，暗示了从理论和经验角度对这种算法进行Φ-regret基础研究的进一步机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Artificial Intelligence Research 工程技术-计算机：人工智能

CiteScore

9.60

自引率

4.00%

发文量

审稿时长

4 months

期刊介绍： JAIR(ISSN 1076 - 9757) covers all areas of artificial intelligence (AI), publishing refereed research articles, survey articles, and technical notes. Established in 1993 as one of the first electronic scientific journals, JAIR is indexed by INSPEC, Science Citation Index, and MathSciNet. JAIR reviews papers within approximately three months of submission and publishes accepted articles on the internet immediately upon receiving the final versions. JAIR articles are published for free distribution on the internet by the AI Access Foundation, and for purchase in bound volumes by AAAI Press.