Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players

Pragnya Alatur, Anas Barakat, Niao He
{"title":"Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players","authors":"Pragnya Alatur, Anas Barakat, Niao He","doi":"arxiv-2408.08075","DOIUrl":null,"url":null,"abstract":"Markov Potential Games (MPGs) form an important sub-class of Markov games,\nwhich are a common framework to model multi-agent reinforcement learning\nproblems. In particular, MPGs include as a special case the identical-interest\nsetting where all the agents share the same reward function. Scaling the\nperformance of Nash equilibrium learning algorithms to a large number of agents\nis crucial for multi-agent systems. To address this important challenge, we\nfocus on the independent learning setting where agents can only have access to\ntheir local information to update their own policy. In prior work on MPGs, the\niteration complexity for obtaining $\\epsilon$-Nash regret scales linearly with\nthe number of agents $N$. In this work, we investigate the iteration complexity\nof an independent policy mirror descent (PMD) algorithm for MPGs. We show that\nPMD with KL regularization, also known as natural policy gradient, enjoys a\nbetter $\\sqrt{N}$ dependence on the number of agents, improving over PMD with\nEuclidean regularization and prior work. Furthermore, the iteration complexity\nis also independent of the sizes of the agents' action spaces.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Markov Potential Games (MPGs) form an important sub-class of Markov games, which are a common framework to model multi-agent reinforcement learning problems. In particular, MPGs include as a special case the identical-interest setting where all the agents share the same reward function. Scaling the performance of Nash equilibrium learning algorithms to a large number of agents is crucial for multi-agent systems. To address this important challenge, we focus on the independent learning setting where agents can only have access to their local information to update their own policy. In prior work on MPGs, the iteration complexity for obtaining $\epsilon$-Nash regret scales linearly with the number of agents $N$. In this work, we investigate the iteration complexity of an independent policy mirror descent (PMD) algorithm for MPGs. We show that PMD with KL regularization, also known as natural policy gradient, enjoys a better $\sqrt{N}$ dependence on the number of agents, improving over PMD with Euclidean regularization and prior work. Furthermore, the iteration complexity is also independent of the sizes of the agents' action spaces.
马尔可夫势能博弈的独立策略镜像后裔:扩展到大量玩家
马尔可夫势能博弈(MPGs)是马尔可夫博弈的一个重要子类,是多代理强化学习问题建模的常用框架。尤其是,马尔可夫势能博弈包括所有代理共享相同奖励函数的相同利益设置这一特例。将纳什均衡学习算法的性能扩展到大量的代理对于多代理系统来说至关重要。为了应对这一重要挑战,我们把重点放在了独立学习设置上,在这种设置中,代理只能获取自己的本地信息来更新自己的策略。在之前关于多代理系统的研究中,获得 $\epsilon$-Nash regret 的迭代复杂度与代理数量 $N$ 成线性比例。在这项工作中,我们研究了 MPGs 独立策略镜像下降(PMD)算法的迭代复杂度。我们的研究表明,采用 KL 正则化的 PMD(也称为自然策略梯度)与代理数量的关系为 $\sqrt{N}$,比采用欧几里得正则化的 PMD 和之前的研究都要好。此外,迭代复杂度也与代理的行动空间大小无关。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信