Regularization of the policy updates for stabilizing Mean Field Games

Advances in Knowledge Discovery and Data Mining : 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings. Part I. Pacific-Asia Conference on Knowledge Discovery and Data Mining (21st : 2017 : Cheju Isl... Pub Date : 2023-04-04 DOI:10.48550/arXiv.2304.01547

Talal Algumaei, Rubén Solozabal, Réda Alami, Hakim Hacid, M. Debbah, Martin Takác

{"title":"Regularization of the policy updates for stabilizing Mean Field Games","authors":"Talal Algumaei, Rubén Solozabal, Réda Alami, Hakim Hacid, M. Debbah, Martin Takác","doi":"10.48550/arXiv.2304.01547","DOIUrl":null,"url":null,"abstract":"This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.","PeriodicalId":91995,"journal":{"name":"Advances in Knowledge Discovery and Data Mining : 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings. Part I. Pacific-Asia Conference on Knowledge Discovery and Data Mining (21st : 2017 : Cheju Isl...","volume":"29 1","pages":"361-372"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Knowledge Discovery and Data Mining : 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings. Part I. Pacific-Asia Conference on Knowledge Discovery and Data Mining (21st : 2017 : Cheju Isl...","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2304.01547","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

This work studies non-cooperative Multi-Agent Reinforcement Learning (MARL) where multiple agents interact in the same environment and whose goal is to maximize the individual returns. Challenges arise when scaling up the number of agents due to the resultant non-stationarity that the many agents introduce. In order to address this issue, Mean Field Games (MFG) rely on the symmetry and homogeneity assumptions to approximate games with very large populations. Recently, deep Reinforcement Learning has been used to scale MFG to games with larger number of states. Current methods rely on smoothing techniques such as averaging the q-values or the updates on the mean-field distribution. This work presents a different approach to stabilize the learning based on proximal updates on the mean-field policy. We name our algorithm Mean Field Proximal Policy Optimization (MF-PPO), and we empirically show the effectiveness of our method in the OpenSpiel framework.

查看原文本刊更多论文

稳定平均场博弈的策略更新的正则化

本文研究了多智能体在同一环境中以个体收益最大化为目标进行交互的非合作多智能体强化学习(MARL)。当增加代理数量时，由于许多代理引入的非平稳性而产生挑战。为了解决这个问题，平均场游戏(Mean Field Games, MFG)依靠对称性和同质性假设来近似具有大量人口的游戏。最近，深度强化学习被用于将MFG扩展到具有更多状态的游戏。目前的方法依赖于平滑技术，如平均q值或更新平均场分布。这项工作提出了一种基于平均场策略的近端更新来稳定学习的不同方法。我们将我们的算法命名为平均域近端策略优化(MF-PPO)，并在OpenSpiel框架中实证地证明了我们的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Advances in Knowledge Discovery and Data Mining : 21st Pacific-Asia Conference, PAKDD 2017, Jeju, South Korea, May 23-26, 2017, Proceedings. Part I. Pacific-Asia Conference on Knowledge Discovery and Data Mining (21st : 2017 : Cheju Isl...

自引率

0.00%

发文量