Rationality of Learning Algorithms in Repeated Normal-Form Games

IF 2.4 Q2 AUTOMATION & CONTROL SYSTEMS
Shivam Bajaj;Pranoy Das;Yevgeniy Vorobeychik;Vijay Gupta
{"title":"Rationality of Learning Algorithms in Repeated Normal-Form Games","authors":"Shivam Bajaj;Pranoy Das;Yevgeniy Vorobeychik;Vijay Gupta","doi":"10.1109/LCSYS.2024.3486631","DOIUrl":null,"url":null,"abstract":"Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n. We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"8 ","pages":"2409-2414"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10735356/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant ${\mathrm { c}}\geq 1$ . We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given ${\mathrm { c}}\geq 1$ and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.
重复正态博弈中学习算法的合理性
众所周知,对于特定类别的博弈,如果所有代理人都采用相同的学习算法,许多学习算法都会趋于均衡。然而,当博弈主体是自利的,一个自然的问题就是博弈主体是否有动机单方面转向另一种学习算法。我们用算法的合理性比率来表示这种动机,即代理人通过单方面偏离学习算法所能获得的最高报酬与遵循该算法所能获得的报酬之比。我们将一种学习算法定义为 c-理性算法,如果它的理性比率至多为 c,则无论博弈情况如何。我们证明,对于任意常数 ${mathrm { c}}\geq 1$ 而言,流行的学习算法(如虚构博弈和后悔匹配)都不是 c-理性的。我们还证明,如果一个代理只能观察到其他代理的行动而不能观察到他们的回报,那么就存在不存在 c-理性算法的博弈。然后,我们提出了一个可以建立在任何现有学习算法基础上的框架,并在温和的假设条件下确定了我们提出的算法:(i) 对于给定的 ${\mathrm { c}\geq 1$ 是 c-合理的;(ii) 如果所有代理人都遵循它,那么代理人的策略就会高概率地收敛到均衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Control Systems Letters
IEEE Control Systems Letters Mathematics-Control and Optimization
CiteScore
4.40
自引率
13.30%
发文量
471
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信