A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning

W. Pan, Cheng Liu
{"title":"A Graph-Based Soft Actor Critic Approach in Multi-Agent Reinforcement Learning","authors":"W. Pan, Cheng Liu","doi":"10.15837/ijccc.2023.1.5062","DOIUrl":null,"url":null,"abstract":"Multi-Agent Reinforcement Learning (MARL) is widely used to solve various real-world problems. In MARL, the environment contains multiple agents. A good grasp of the environment can guide agents to learn cooperative strategies. In Centralized Training Decentralized Execution (CTDE), a centralized critic is used to guide cooperative strategies learning. However, having multiple agents in the environment leads to the curse of dimensionality and influence of other agents’ strategies, resulting in difficulties for centralized critics to learn good cooperative strategies. We propose a graph-based approach to overcome the above problems. It uses a graph neural network, which uses partial observations of agents as input, and information between agents is aggregated by graph methods to extract information about the whole environment. In this way, agents can improve their understanding of the overall state of the environment and other agents in the environment while avoiding dimensional explosion. Then we combine a dual critic dynamic decomposition method with soft actor-critic to train policy. The former uses individual and global rewards for learning, avoiding the influence of other agents’ strategies, and the latter help to learn an optional policy better. We call this approach Multi-Agent Graph-based soft Actor-Critic (MAGAC). We compare our proposed method with several classical MARL algorithms under the Multi-agent Particle Environment (MPE). The experimental results show that our method can achieve a faster learning speed while learning better policy.","PeriodicalId":179619,"journal":{"name":"Int. J. Comput. Commun. Control","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Commun. Control","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15837/ijccc.2023.1.5062","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Multi-Agent Reinforcement Learning (MARL) is widely used to solve various real-world problems. In MARL, the environment contains multiple agents. A good grasp of the environment can guide agents to learn cooperative strategies. In Centralized Training Decentralized Execution (CTDE), a centralized critic is used to guide cooperative strategies learning. However, having multiple agents in the environment leads to the curse of dimensionality and influence of other agents’ strategies, resulting in difficulties for centralized critics to learn good cooperative strategies. We propose a graph-based approach to overcome the above problems. It uses a graph neural network, which uses partial observations of agents as input, and information between agents is aggregated by graph methods to extract information about the whole environment. In this way, agents can improve their understanding of the overall state of the environment and other agents in the environment while avoiding dimensional explosion. Then we combine a dual critic dynamic decomposition method with soft actor-critic to train policy. The former uses individual and global rewards for learning, avoiding the influence of other agents’ strategies, and the latter help to learn an optional policy better. We call this approach Multi-Agent Graph-based soft Actor-Critic (MAGAC). We compare our proposed method with several classical MARL algorithms under the Multi-agent Particle Environment (MPE). The experimental results show that our method can achieve a faster learning speed while learning better policy.
多智能体强化学习中基于图的软行为者评价方法
多智能体强化学习(MARL)被广泛用于解决各种现实问题。在MARL中,环境包含多个代理。对环境的良好把握可以引导agent学习合作策略。在集中训练和分散执行(CTDE)中,采用集中批评来指导合作策略的学习。然而,当环境中存在多个智能体时,会导致维度的诅咒和其他智能体策略的影响,导致集中式批评难以学习到好的合作策略。我们提出了一种基于图的方法来克服上述问题。该方法采用图神经网络,将智能体的局部观测作为输入,通过图方法对智能体之间的信息进行聚合,提取出整个环境的信息。这样,智能体可以在避免维度爆炸的同时,提高对环境和环境中其他智能体的整体状态的理解。然后将双批评家动态分解方法与软行为者批评家相结合,进行策略训练。前者使用个体和全局奖励进行学习,避免了其他代理策略的影响,后者有助于更好地学习可选策略。我们称这种方法为基于多智能体图的软Actor-Critic (MAGAC)。在多智能体粒子环境(MPE)下,将该方法与几种经典的MARL算法进行了比较。实验结果表明,该方法在学习到更好的策略的同时,可以获得更快的学习速度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信