行动成本最小有界的随机博弈

arXiv - CS - Multiagent Systems Pub Date : 2024-07-25 DOI:arxiv-2407.18010

David Mguni

{"title":"行动成本最小有界的随机博弈","authors":"David Mguni","doi":"arxiv-2407.18010","DOIUrl":null,"url":null,"abstract":"In many multi-player interactions, players incur strictly positive costs each\ntime they execute actions e.g. 'menu costs' or transaction costs in financial\nsystems. Since acting at each available opportunity would accumulate\nprohibitively large costs, the resulting decision problem is one in which\nplayers must make strategic decisions about when to execute actions in addition\nto their choice of action. This paper analyses a discrete-time stochastic game\n(SG) in which players face minimally bounded positive costs for each action and\ninfluence the system using impulse controls. We prove SGs of two-sided impulse\ncontrol have a unique value and characterise the saddle point equilibrium in\nwhich the players execute actions at strategically chosen times in accordance\nwith Markovian strategies. We prove the game respects a dynamic programming\nprinciple and that the Markov perfect equilibrium can be computed as a limit\npoint of a sequence of Bellman operations. We then introduce a new Q-learning\nvariant which we show converges almost surely to the value of the game enabling\nsolutions to be extracted in unknown settings. Lastly, we extend our results to\nsettings with budgetory constraints.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"165 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Stochastic Games with Minimally Bounded Action Costs\",\"authors\":\"David Mguni\",\"doi\":\"arxiv-2407.18010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In many multi-player interactions, players incur strictly positive costs each\\ntime they execute actions e.g. 'menu costs' or transaction costs in financial\\nsystems. Since acting at each available opportunity would accumulate\\nprohibitively large costs, the resulting decision problem is one in which\\nplayers must make strategic decisions about when to execute actions in addition\\nto their choice of action. This paper analyses a discrete-time stochastic game\\n(SG) in which players face minimally bounded positive costs for each action and\\ninfluence the system using impulse controls. We prove SGs of two-sided impulse\\ncontrol have a unique value and characterise the saddle point equilibrium in\\nwhich the players execute actions at strategically chosen times in accordance\\nwith Markovian strategies. We prove the game respects a dynamic programming\\nprinciple and that the Markov perfect equilibrium can be computed as a limit\\npoint of a sequence of Bellman operations. We then introduce a new Q-learning\\nvariant which we show converges almost surely to the value of the game enabling\\nsolutions to be extracted in unknown settings. Lastly, we extend our results to\\nsettings with budgetory constraints.\",\"PeriodicalId\":501315,\"journal\":{\"name\":\"arXiv - CS - Multiagent Systems\",\"volume\":\"165 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Multiagent Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2407.18010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.18010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在许多多玩家互动中，玩家每次采取行动都会产生严格的正成本，例如金融系统中的 "菜单成本 "或交易成本。由于在每个可利用的机会采取行动都会累积高得惊人的成本，因此由此产生的决策问题是，博弈者除了选择行动外，还必须就何时采取行动做出战略决策。本文分析了一种离散时间随机博弈（SG），在这种博弈中，博弈者的每次行动都会面临最小约束的正成本，并使用脉冲控制来影响系统。我们证明了双面脉冲控制的 SG 具有唯一值，并描述了鞍点均衡的特征，在鞍点均衡中，博弈方按照马尔可夫策略在策略选择的时间执行行动。我们证明博弈遵守动态编程原则，马尔可夫完美均衡可以作为贝尔曼运算序列的极限点来计算。然后，我们引入了一个新的 Q-learning 变量，并证明该变量几乎肯定收敛于博弈值，从而能在未知环境中提取解决方案。最后，我们将结果扩展到具有预算约束的设置。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Stochastic Games with Minimally Bounded Action Costs

In many multi-player interactions, players incur strictly positive costs each time they execute actions e.g. 'menu costs' or transaction costs in financial systems. Since acting at each available opportunity would accumulate prohibitively large costs, the resulting decision problem is one in which players must make strategic decisions about when to execute actions in addition to their choice of action. This paper analyses a discrete-time stochastic game (SG) in which players face minimally bounded positive costs for each action and influence the system using impulse controls. We prove SGs of two-sided impulse control have a unique value and characterise the saddle point equilibrium in which the players execute actions at strategically chosen times in accordance with Markovian strategies. We prove the game respects a dynamic programming principle and that the Markov perfect equilibrium can be computed as a limit point of a sequence of Bellman operations. We then introduce a new Q-learning variant which we show converges almost surely to the value of the game enabling solutions to be extracted in unknown settings. Lastly, we extend our results to settings with budgetory constraints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量