{"title":"具有公共噪声的平均场Markov决策过程混沌的定量传播","authors":"M'ed'eric Motte, H. Pham","doi":"10.1214/23-ejp978","DOIUrl":null,"url":null,"abstract":"We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\\gamma \\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\\epsilon+\\mathcal{O}(M_N^\\gamma))$-optimal policies for the $N$-agent model from $\\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.","PeriodicalId":50538,"journal":{"name":"Electronic Journal of Probability","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Quantitative propagation of chaos for mean field Markov decision process with common noise\",\"authors\":\"M'ed'eric Motte, H. Pham\",\"doi\":\"10.1214/23-ejp978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\\\\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\\\\gamma \\\\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\\\\epsilon+\\\\mathcal{O}(M_N^\\\\gamma))$-optimal policies for the $N$-agent model from $\\\\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.\",\"PeriodicalId\":50538,\"journal\":{\"name\":\"Electronic Journal of Probability\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Journal of Probability\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/23-ejp978\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Journal of Probability","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-ejp978","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
Quantitative propagation of chaos for mean field Markov decision process with common noise
We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\gamma \in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\epsilon+\mathcal{O}(M_N^\gamma))$-optimal policies for the $N$-agent model from $\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.
期刊介绍:
The Electronic Journal of Probability publishes full-size research articles in probability theory. The Electronic Communications in Probability (ECP), a sister journal of EJP, publishes short notes and research announcements in probability theory.
Both ECP and EJP are official journals of the Institute of Mathematical Statistics
and the Bernoulli society.