{"title":"Rationality of Learning Algorithms in Repeated Normal-Form Games","authors":"Shivam Bajaj;Pranoy Das;Yevgeniy Vorobeychik;Vijay Gupta","doi":"10.1109/LCSYS.2024.3486631","DOIUrl":null,"url":null,"abstract":"Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n. We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given \n<inline-formula> <tex-math>${\\mathrm { c}}\\geq 1$ </tex-math></inline-formula>\n and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.","PeriodicalId":37235,"journal":{"name":"IEEE Control Systems Letters","volume":"8 ","pages":"2409-2414"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Control Systems Letters","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10735356/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Many learning algorithms are known to converge to an equilibrium for specific classes of games if the same learning algorithm is adopted by all agents. However, when the agents are self-interested, a natural question is whether the agents have an incentive to unilaterally shift to an alternative learning algorithm. We capture such incentives as an algorithm’s rationality ratio, which is the ratio of the highest payoff an agent can obtain by unilaterally deviating from a learning algorithm to its payoff from following it. We define a learning algorithm to be c-rational if its rationality ratio is at most c irrespective of the game. We show that popular learning algorithms such as fictitious play and regret-matching are not c-rational for any constant
${\mathrm { c}}\geq 1$
. We also show that if an agent can only observe the actions of the other agents but not their payoffs, then there are games for which c-rational algorithms do not exist. We then propose a framework that can build upon any existing learning algorithm and establish, under mild assumptions, that our proposed algorithm is (i) c-rational for a given
${\mathrm { c}}\geq 1$
and (ii) the strategies of the agents converge to an equilibrium, with high probability, if all agents follow it.