Rationality of Learning Algorithms in Repeated Normal-Form Games

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS

IEEE Control Systems Letters Pub Date : 2024-10-25 DOI:10.1109/LCSYS.2024.3486631

Shivam Bajaj;Pranoy Das;Yevgeniy Vorobeychik;Vijay Gupta

{"title":"Rationality of Learning Algorithms in Repeated Normal-Form Games","authors":"Shivam Bajaj;Pranoy Das;Yevgeniy Vorobeychik;Vijay Gupta","doi":"10.1109/LCSYS.2024.3486631","DOIUrl":null,"url":null,"abstract":"Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n. We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10735356/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant

${\mathrm { c}}\geq 1$

. We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given

${\mathrm { c}}\geq 1$

and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.

查看原文本刊更多论文

重复正态博弈中学习算法的合理性

众所周知，对于特定类别的博弈，如果所有代理人都采用相同的学习算法，许多学习算法都会趋于均衡。然而，当博弈主体是自利的，一个自然的问题就是博弈主体是否有动机单方面转向另一种学习算法。我们用算法的合理性比率来表示这种动机，即代理人通过单方面偏离学习算法所能获得的最高报酬与遵循该算法所能获得的报酬之比。我们将一种学习算法定义为 c-理性算法，如果它的理性比率至多为 c，则无论博弈情况如何。我们证明，对于任意常数 ${mathrm { c}}\geq 1$ 而言，流行的学习算法（如虚构博弈和后悔匹配）都不是 c-理性的。我们还证明，如果一个代理只能观察到其他代理的行动而不能观察到他们的回报，那么就存在不存在 c-理性算法的博弈。然后，我们提出了一个可以建立在任何现有学习算法基础上的框架，并在温和的假设条件下确定了我们提出的算法：(i) 对于给定的 ${\mathrm { c}\geq 1$ 是 c-合理的；(ii) 如果所有代理人都遵循它，那么代理人的策略就会高概率地收敛到均衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Control Systems Letters Mathematics-Control and Optimization

CiteScore

4.40

自引率

13.30%

发文量

471