具有公共噪声的平均场Markov决策过程混沌的定量传播

IF 1.3 3区数学 Q2 STATISTICS & PROBABILITY

Electronic Journal of Probability Pub Date : 2022-07-26 DOI:10.1214/23-ejp978

M'ed'eric Motte, H. Pham

{"title":"具有公共噪声的平均场Markov决策过程混沌的定量传播","authors":"M'ed'eric Motte, H. Pham","doi":"10.1214/23-ejp978","DOIUrl":null,"url":null,"abstract":"We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\\gamma \\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\\epsilon+\\mathcal{O}(M_N^\\gamma))$-optimal policies for the $N$-agent model from $\\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.","PeriodicalId":50538,"journal":{"name":"Electronic Journal of Probability","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Quantitative propagation of chaos for mean field Markov decision process with common noise\",\"authors\":\"M'ed'eric Motte, H. Pham\",\"doi\":\"10.1214/23-ejp978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\\\\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\\\\gamma \\\\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\\\\epsilon+\\\\mathcal{O}(M_N^\\\\gamma))$-optimal policies for the $N$-agent model from $\\\\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.\",\"PeriodicalId\":50538,\"journal\":{\"name\":\"Electronic Journal of Probability\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Journal of Probability\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/23-ejp978\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Journal of Probability","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-ejp978","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 3

摘要

我们研究了具有公共噪声的平均场马尔可夫决策过程（CMKV-MDP）的混沌传播，以及在无限时域上对随机开环控制进行优化时的混沌传播。我们首先给出$M_N^\gamma$阶的收敛速度，其中$M_N$是经验测度在Wasserstein距离上的平均收敛速度，并且$\gamma\in（0,1]$是一个显式常数，在具有非对称开环控制的$N$-agent控制问题的值函数的极限下，对于CMKV-MDP的值函数。此外，我们展示了如何从CMKV-MDP$\epsilon$-最优策略显式构造$N$-agent模型的$（\epsilon\mathcal｛O｝（M_N^\gamma））$-最优政策。我们的方法依赖于$N$-agent问题中的Bellman算子和CMKV-MDP之间的尖锐比较，以及经验测度的精细耦合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantitative propagation of chaos for mean field Markov decision process with common noise

We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\gamma \in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\epsilon+\mathcal{O}(M_N^\gamma))$-optimal policies for the $N$-agent model from $\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Electronic Journal of Probability 数学-统计学与概率论

CiteScore

1.80

自引率

7.10%

发文量

119

审稿时长

4-8 weeks

期刊介绍： The Electronic Journal of Probability publishes full-size research articles in probability theory. The Electronic Communications in Probability (ECP), a sister journal of EJP, publishes short notes and research announcements in probability theory. Both ECP and EJP are official journals of the Institute of Mathematical Statistics and the Bernoulli society.