{"title":"行动成本最小有界的随机博弈","authors":"David Mguni","doi":"arxiv-2407.18010","DOIUrl":null,"url":null,"abstract":"In many multi-player interactions, players incur strictly positive costs each\ntime they execute actions e.g. 'menu costs' or transaction costs in financial\nsystems. Since acting at each available opportunity would accumulate\nprohibitively large costs, the resulting decision problem is one in which\nplayers must make strategic decisions about when to execute actions in addition\nto their choice of action. This paper analyses a discrete-time stochastic game\n(SG) in which players face minimally bounded positive costs for each action and\ninfluence the system using impulse controls. We prove SGs of two-sided impulse\ncontrol have a unique value and characterise the saddle point equilibrium in\nwhich the players execute actions at strategically chosen times in accordance\nwith Markovian strategies. We prove the game respects a dynamic programming\nprinciple and that the Markov perfect equilibrium can be computed as a limit\npoint of a sequence of Bellman operations. We then introduce a new Q-learning\nvariant which we show converges almost surely to the value of the game enabling\nsolutions to be extracted in unknown settings. Lastly, we extend our results to\nsettings with budgetory constraints.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"165 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Games with Minimally Bounded Action Costs\",\"authors\":\"David Mguni\",\"doi\":\"arxiv-2407.18010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many multi-player interactions, players incur strictly positive costs each\\ntime they execute actions e.g. 'menu costs' or transaction costs in financial\\nsystems. Since acting at each available opportunity would accumulate\\nprohibitively large costs, the resulting decision problem is one in which\\nplayers must make strategic decisions about when to execute actions in addition\\nto their choice of action. This paper analyses a discrete-time stochastic game\\n(SG) in which players face minimally bounded positive costs for each action and\\ninfluence the system using impulse controls. We prove SGs of two-sided impulse\\ncontrol have a unique value and characterise the saddle point equilibrium in\\nwhich the players execute actions at strategically chosen times in accordance\\nwith Markovian strategies. We prove the game respects a dynamic programming\\nprinciple and that the Markov perfect equilibrium can be computed as a limit\\npoint of a sequence of Bellman operations. We then introduce a new Q-learning\\nvariant which we show converges almost surely to the value of the game enabling\\nsolutions to be extracted in unknown settings. Lastly, we extend our results to\\nsettings with budgetory constraints.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"165 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.18010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.18010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stochastic Games with Minimally Bounded Action Costs
In many multi-player interactions, players incur strictly positive costs each
time they execute actions e.g. 'menu costs' or transaction costs in financial
systems. Since acting at each available opportunity would accumulate
prohibitively large costs, the resulting decision problem is one in which
players must make strategic decisions about when to execute actions in addition
to their choice of action. This paper analyses a discrete-time stochastic game
(SG) in which players face minimally bounded positive costs for each action and
influence the system using impulse controls. We prove SGs of two-sided impulse
control have a unique value and characterise the saddle point equilibrium in
which the players execute actions at strategically chosen times in accordance
with Markovian strategies. We prove the game respects a dynamic programming
principle and that the Markov perfect equilibrium can be computed as a limit
point of a sequence of Bellman operations. We then introduce a new Q-learning
variant which we show converges almost surely to the value of the game enabling
solutions to be extracted in unknown settings. Lastly, we extend our results to
settings with budgetory constraints.