Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach

IF 1.4 3区 数学 Q2 MATHEMATICS, APPLIED
Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu
{"title":"Mean-Field Multiagent Reinforcement Learning: A Decentralized Network Approach","authors":"Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu","doi":"10.1287/moor.2022.0055","DOIUrl":null,"url":null,"abstract":"One of the challenges for multiagent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. Whereas exciting progress has been made to analyze decentralized MARL with the network of agents for social networks and team video games, little is known theoretically for decentralized MARL with the network of states for modeling self-driving vehicles, ride-sharing, and data and traffic routing. This paper proposes a framework of localized training and decentralized execution to study MARL with the network of states. Localized training means that agents only need to collect local information in their neighboring states during the training phase; decentralized execution implies that agents can execute afterward the learned decentralized policies, which depend only on agents’ current states. The theoretical analysis consists of three key components: the first is the reformulation of the MARL system as a networked Markov decision process with teams of agents, enabling updating the associated team Q-function in a localized fashion; the second is the Bellman equation for the value function and the appropriate Q-function on the probability measure space; and the third is the exponential decay property of the team Q-function, facilitating its approximation with efficient sample efficiency and controllable error. The theoretical analysis paves the way for a new algorithm LTDE-Neural-AC, in which the actor–critic approach with overparameterized neural networks is proposed. The convergence and sample complexity are established and shown to be scalable with respect to the sizes of both agents and states. To the best of our knowledge, this is the first neural network–based MARL algorithm with network structure and provable convergence guarantee.Funding: X. Wei is partially supported by NSFC no. 12201343. R. Xu is partially supported by the NSF CAREER award DMS-2339240.","PeriodicalId":49852,"journal":{"name":"Mathematics of Operations Research","volume":"30 1","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics of Operations Research","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1287/moor.2022.0055","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

One of the challenges for multiagent reinforcement learning (MARL) is designing efficient learning algorithms for a large system in which each agent has only limited or partial information of the entire system. Whereas exciting progress has been made to analyze decentralized MARL with the network of agents for social networks and team video games, little is known theoretically for decentralized MARL with the network of states for modeling self-driving vehicles, ride-sharing, and data and traffic routing. This paper proposes a framework of localized training and decentralized execution to study MARL with the network of states. Localized training means that agents only need to collect local information in their neighboring states during the training phase; decentralized execution implies that agents can execute afterward the learned decentralized policies, which depend only on agents’ current states. The theoretical analysis consists of three key components: the first is the reformulation of the MARL system as a networked Markov decision process with teams of agents, enabling updating the associated team Q-function in a localized fashion; the second is the Bellman equation for the value function and the appropriate Q-function on the probability measure space; and the third is the exponential decay property of the team Q-function, facilitating its approximation with efficient sample efficiency and controllable error. The theoretical analysis paves the way for a new algorithm LTDE-Neural-AC, in which the actor–critic approach with overparameterized neural networks is proposed. The convergence and sample complexity are established and shown to be scalable with respect to the sizes of both agents and states. To the best of our knowledge, this is the first neural network–based MARL algorithm with network structure and provable convergence guarantee.Funding: X. Wei is partially supported by NSFC no. 12201343. R. Xu is partially supported by the NSF CAREER award DMS-2339240.
平均场多代理强化学习:分散网络方法
多代理强化学习(MARL)面临的挑战之一,是为一个大型系统设计高效的学习算法,而在这个系统中,每个代理只掌握整个系统的有限或部分信息。虽然在分析社交网络和团队视频游戏中的代理网络分散强化学习方面已经取得了令人振奋的进展,但对于自动驾驶汽车建模、共享乘车、数据和交通路由的状态网络分散强化学习,理论界却知之甚少。本文提出了一个本地化训练和分散执行的框架,以研究具有状态网络的 MARL。本地化训练指的是,代理在训练阶段只需收集其相邻状态的本地信息;分散执行指的是,代理可以在事后执行所学到的分散策略,这些策略只取决于代理的当前状态。理论分析由三个关键部分组成:第一部分是将 MARL 系统重新表述为一个具有代理团队的网络化马尔可夫决策过程,从而能够以本地化的方式更新相关的团队 Q 函数;第二部分是概率度量空间上的价值函数和适当 Q 函数的贝尔曼方程;第三部分是团队 Q 函数的指数衰减特性,有助于以高效的样本效率和可控的误差对其进行近似。理论分析为新算法 LTDE-Neural-AC 的提出铺平了道路。该算法的收敛性和采样复杂度均已确定,并证明可根据代理和状态的大小进行扩展。据我们所知,这是第一个基于神经网络的 MARL 算法,具有网络结构和可证明的收敛性保证:国家自然科学基金委员会编号:12201343。12201343.R. Xu 部分获得国家自然科学基金 CAREER 奖 DMS-2339240 的资助。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mathematics of Operations Research
Mathematics of Operations Research 管理科学-应用数学
CiteScore
3.40
自引率
5.90%
发文量
178
审稿时长
15.0 months
期刊介绍: Mathematics of Operations Research is an international journal of the Institute for Operations Research and the Management Sciences (INFORMS). The journal invites articles concerned with the mathematical and computational foundations in the areas of continuous, discrete, and stochastic optimization; mathematical programming; dynamic programming; stochastic processes; stochastic models; simulation methodology; control and adaptation; networks; game theory; and decision theory. Also sought are contributions to learning theory and machine learning that have special relevance to decision making, operations research, and management science. The emphasis is on originality, quality, and importance; correctness alone is not sufficient. Significant developments in operations research and management science not having substantial mathematical interest should be directed to other journals such as Management Science or Operations Research.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信