{"title":"Independent Policy Mirror Descent for Markov Potential Games: Scaling to Large Number of Players","authors":"Pragnya Alatur, Anas Barakat, Niao He","doi":"arxiv-2408.08075","DOIUrl":null,"url":null,"abstract":"Markov Potential Games (MPGs) form an important sub-class of Markov games,\nwhich are a common framework to model multi-agent reinforcement learning\nproblems. In particular, MPGs include as a special case the identical-interest\nsetting where all the agents share the same reward function. Scaling the\nperformance of Nash equilibrium learning algorithms to a large number of agents\nis crucial for multi-agent systems. To address this important challenge, we\nfocus on the independent learning setting where agents can only have access to\ntheir local information to update their own policy. In prior work on MPGs, the\niteration complexity for obtaining $\\epsilon$-Nash regret scales linearly with\nthe number of agents $N$. In this work, we investigate the iteration complexity\nof an independent policy mirror descent (PMD) algorithm for MPGs. We show that\nPMD with KL regularization, also known as natural policy gradient, enjoys a\nbetter $\\sqrt{N}$ dependence on the number of agents, improving over PMD with\nEuclidean regularization and prior work. Furthermore, the iteration complexity\nis also independent of the sizes of the agents' action spaces.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"36 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.08075","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Markov Potential Games (MPGs) form an important sub-class of Markov games,
which are a common framework to model multi-agent reinforcement learning
problems. In particular, MPGs include as a special case the identical-interest
setting where all the agents share the same reward function. Scaling the
performance of Nash equilibrium learning algorithms to a large number of agents
is crucial for multi-agent systems. To address this important challenge, we
focus on the independent learning setting where agents can only have access to
their local information to update their own policy. In prior work on MPGs, the
iteration complexity for obtaining $\epsilon$-Nash regret scales linearly with
the number of agents $N$. In this work, we investigate the iteration complexity
of an independent policy mirror descent (PMD) algorithm for MPGs. We show that
PMD with KL regularization, also known as natural policy gradient, enjoys a
better $\sqrt{N}$ dependence on the number of agents, improving over PMD with
Euclidean regularization and prior work. Furthermore, the iteration complexity
is also independent of the sizes of the agents' action spaces.