{"title":"正则博弈中具有先验收益信息的强化学习","authors":"Naoki Funai","doi":"10.2139/ssrn.3343373","DOIUrl":null,"url":null,"abstract":"This paper studies the reinforcement learning of Erev and Roth with foregone payoff information in normal form games: players observe not only the realised payoffs but also the ones which they could have obtained if they had chosen the other actions. We provide conditions under which the reinforcement learning process converges to a mixed action profile at which each action is chosen with a probability proportional to its expected payoff. In pure coordination games, the mixed action profile corresponds to the mixed Nash equilibrium.","PeriodicalId":356570,"journal":{"name":"CompSciRN: Problem Solving (Topic)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Reinforcement Learning with Foregone Payoff Information in Normal Form Games\",\"authors\":\"Naoki Funai\",\"doi\":\"10.2139/ssrn.3343373\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper studies the reinforcement learning of Erev and Roth with foregone payoff information in normal form games: players observe not only the realised payoffs but also the ones which they could have obtained if they had chosen the other actions. We provide conditions under which the reinforcement learning process converges to a mixed action profile at which each action is chosen with a probability proportional to its expected payoff. In pure coordination games, the mixed action profile corresponds to the mixed Nash equilibrium.\",\"PeriodicalId\":356570,\"journal\":{\"name\":\"CompSciRN: Problem Solving (Topic)\",\"volume\":\"196 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"CompSciRN: Problem Solving (Topic)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3343373\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"CompSciRN: Problem Solving (Topic)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3343373","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Reinforcement Learning with Foregone Payoff Information in Normal Form Games
This paper studies the reinforcement learning of Erev and Roth with foregone payoff information in normal form games: players observe not only the realised payoffs but also the ones which they could have obtained if they had chosen the other actions. We provide conditions under which the reinforcement learning process converges to a mixed action profile at which each action is chosen with a probability proportional to its expected payoff. In pure coordination games, the mixed action profile corresponds to the mixed Nash equilibrium.