具有公共噪声的平均场Markov决策过程混沌的定量传播

IF 1.3 3区 数学 Q2 STATISTICS & PROBABILITY
M'ed'eric Motte, H. Pham
{"title":"具有公共噪声的平均场Markov决策过程混沌的定量传播","authors":"M'ed'eric Motte, H. Pham","doi":"10.1214/23-ejp978","DOIUrl":null,"url":null,"abstract":"We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\\gamma \\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\\epsilon+\\mathcal{O}(M_N^\\gamma))$-optimal policies for the $N$-agent model from $\\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.","PeriodicalId":50538,"journal":{"name":"Electronic Journal of Probability","volume":" ","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Quantitative propagation of chaos for mean field Markov decision process with common noise\",\"authors\":\"M'ed'eric Motte, H. Pham\",\"doi\":\"10.1214/23-ejp978\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\\\\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\\\\gamma \\\\in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\\\\epsilon+\\\\mathcal{O}(M_N^\\\\gamma))$-optimal policies for the $N$-agent model from $\\\\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.\",\"PeriodicalId\":50538,\"journal\":{\"name\":\"Electronic Journal of Probability\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-07-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Journal of Probability\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1214/23-ejp978\",\"RegionNum\":3,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Journal of Probability","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/23-ejp978","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 3

摘要

我们研究了具有公共噪声的平均场马尔可夫决策过程(CMKV-MDP)的混沌传播,以及在无限时域上对随机开环控制进行优化时的混沌传播。我们首先给出$M_N^\gamma$阶的收敛速度,其中$M_N$是经验测度在Wasserstein距离上的平均收敛速度,并且$\gamma\in(0,1]$是一个显式常数,在具有非对称开环控制的$N$-agent控制问题的值函数的极限下,对于CMKV-MDP的值函数。此外,我们展示了如何从CMKV-MDP$\epsilon$-最优策略显式构造$N$-agent模型的$(\epsilon\mathcal{O}(M_N^\gamma))$-最优政策。我们的方法依赖于$N$-agent问题中的Bellman算子和CMKV-MDP之间的尖锐比较,以及经验测度的精细耦合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Quantitative propagation of chaos for mean field Markov decision process with common noise
We investigate propagation of chaos for mean field Markov Decision Process with common noise (CMKV-MDP), and when the optimization is performed over randomized open-loop controls on infinite horizon. We first state a rate of convergence of order $M_N^\gamma$, where $M_N$ is the mean rate of convergence in Wasserstein distance of the empirical measure, and $\gamma \in (0,1]$ is an explicit constant, in the limit of the value functions of $N$-agent control problem with asymmetric open-loop controls, towards the value function of CMKV-MDP. Furthermore, we show how to explicitly construct $(\epsilon+\mathcal{O}(M_N^\gamma))$-optimal policies for the $N$-agent model from $\epsilon$-optimal policies for the CMKV-MDP. Our approach relies on sharp comparison between the Bellman operators in the $N$-agent problem and the CMKV-MDP, and fine coupling of empirical measures.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Electronic Journal of Probability
Electronic Journal of Probability 数学-统计学与概率论
CiteScore
1.80
自引率
7.10%
发文量
119
审稿时长
4-8 weeks
期刊介绍: The Electronic Journal of Probability publishes full-size research articles in probability theory. The Electronic Communications in Probability (ECP), a sister journal of EJP, publishes short notes and research announcements in probability theory. Both ECP and EJP are official journals of the Institute of Mathematical Statistics and the Bernoulli society.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信