GDT:基于自适应分组动态拓扑空间的多代理强化学习框架

IF 8.1 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Licheng Sun , Hongbin Ma , Zhentao Guo
{"title":"GDT:基于自适应分组动态拓扑空间的多代理强化学习框架","authors":"Licheng Sun ,&nbsp;Hongbin Ma ,&nbsp;Zhentao Guo","doi":"10.1016/j.ins.2024.121646","DOIUrl":null,"url":null,"abstract":"<div><div>In many real-world scenarios, tasks involve coordinating multiple agents, such as managing robot clusters, drone swarms, and autonomous vehicles. These tasks are commonly addressed using Multi-Agent Reinforcement Learning (MARL). However, existing MARL algorithms often lack foresight regarding the number and types of agents involved, requiring agents to generalize across various task configurations. This may lead to suboptimal performance due to underestimated action values and the selection of less effective joint policies. To address these challenges, we propose a novel multi-agent deep reinforcement learning framework, called multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space (GDT). GDT utilizes a group mesh topology to interconnect the local action value functions of each agent, enabling effective coordination and knowledge sharing among agents. By computing three different interpretations of action value functions, GDT overcomes monotonicity constraints and derives more effective overall action value functions. Additionally, GDT groups agents with high similarity to facilitate parameter sharing, thereby enhancing knowledge transfer and generalization across different scenarios. Furthermore, GDT introduces a strategy regularization method for optimal exploration of multiple action spaces. This method assigns each agent an independent entropy temperature during exploration, enabling agents to efficiently explore potential actions and approximate total state values. Experimental results demonstrate that our approach, termed GDT, significantly outperforms state-of-the-art algorithms on Google Research Football (GRF) and the StarCraft Multi-Agent Challenge (SMAC). Particularly in SMAC tasks, GDT achieves a success rate of nearly 100% across almost all Hard Map and Super Hard Map scenarios. Additionally, we validate the effectiveness of our algorithm on Non-monotonic Matrix Games.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"691 ","pages":"Article 121646"},"PeriodicalIF":8.1000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GDT: Multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space\",\"authors\":\"Licheng Sun ,&nbsp;Hongbin Ma ,&nbsp;Zhentao Guo\",\"doi\":\"10.1016/j.ins.2024.121646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In many real-world scenarios, tasks involve coordinating multiple agents, such as managing robot clusters, drone swarms, and autonomous vehicles. These tasks are commonly addressed using Multi-Agent Reinforcement Learning (MARL). However, existing MARL algorithms often lack foresight regarding the number and types of agents involved, requiring agents to generalize across various task configurations. This may lead to suboptimal performance due to underestimated action values and the selection of less effective joint policies. To address these challenges, we propose a novel multi-agent deep reinforcement learning framework, called multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space (GDT). GDT utilizes a group mesh topology to interconnect the local action value functions of each agent, enabling effective coordination and knowledge sharing among agents. By computing three different interpretations of action value functions, GDT overcomes monotonicity constraints and derives more effective overall action value functions. Additionally, GDT groups agents with high similarity to facilitate parameter sharing, thereby enhancing knowledge transfer and generalization across different scenarios. Furthermore, GDT introduces a strategy regularization method for optimal exploration of multiple action spaces. This method assigns each agent an independent entropy temperature during exploration, enabling agents to efficiently explore potential actions and approximate total state values. Experimental results demonstrate that our approach, termed GDT, significantly outperforms state-of-the-art algorithms on Google Research Football (GRF) and the StarCraft Multi-Agent Challenge (SMAC). Particularly in SMAC tasks, GDT achieves a success rate of nearly 100% across almost all Hard Map and Super Hard Map scenarios. Additionally, we validate the effectiveness of our algorithm on Non-monotonic Matrix Games.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"691 \",\"pages\":\"Article 121646\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025524015603\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025524015603","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

在现实世界的许多场景中,任务都涉及协调多个代理,例如管理机器人集群、无人机群和自动驾驶汽车。这些任务通常使用多代理强化学习(MARL)来解决。然而,现有的多代理强化学习算法往往缺乏对所涉及代理的数量和类型的预见性,要求代理在各种任务配置中进行泛化。由于低估了行动值并选择了效果较差的联合策略,这可能会导致性能不理想。为了应对这些挑战,我们提出了一种新颖的多代理深度强化学习框架,即基于自适应分组动态拓扑空间(GDT)的多代理强化学习框架。GDT 利用组网拓扑结构将每个代理的局部行动值函数相互连接起来,从而实现代理之间的有效协调和知识共享。通过计算行动值函数的三种不同解释,GDT 克服了单调性限制,并推导出更有效的整体行动值函数。此外,GDT 还将具有高度相似性的代理进行分组,以促进参数共享,从而加强不同情景下的知识传递和泛化。此外,GDT 还引入了一种策略正则化方法,用于优化对多个行动空间的探索。该方法在探索过程中为每个代理分配一个独立的熵温,使代理能够高效地探索潜在的行动并近似地计算总状态值。实验结果表明,在谷歌研究足球赛(GRF)和星际争霸多代理挑战赛(SMAC)上,我们的方法(称为 GDT)明显优于最先进的算法。特别是在 SMAC 任务中,GDT 在几乎所有 "高难度地图 "和 "超高难度地图 "场景中的成功率都接近 100%。此外,我们还在非单调矩阵游戏中验证了我们算法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
GDT: Multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space
In many real-world scenarios, tasks involve coordinating multiple agents, such as managing robot clusters, drone swarms, and autonomous vehicles. These tasks are commonly addressed using Multi-Agent Reinforcement Learning (MARL). However, existing MARL algorithms often lack foresight regarding the number and types of agents involved, requiring agents to generalize across various task configurations. This may lead to suboptimal performance due to underestimated action values and the selection of less effective joint policies. To address these challenges, we propose a novel multi-agent deep reinforcement learning framework, called multi-agent reinforcement learning framework based on adaptive grouping dynamic topological space (GDT). GDT utilizes a group mesh topology to interconnect the local action value functions of each agent, enabling effective coordination and knowledge sharing among agents. By computing three different interpretations of action value functions, GDT overcomes monotonicity constraints and derives more effective overall action value functions. Additionally, GDT groups agents with high similarity to facilitate parameter sharing, thereby enhancing knowledge transfer and generalization across different scenarios. Furthermore, GDT introduces a strategy regularization method for optimal exploration of multiple action spaces. This method assigns each agent an independent entropy temperature during exploration, enabling agents to efficiently explore potential actions and approximate total state values. Experimental results demonstrate that our approach, termed GDT, significantly outperforms state-of-the-art algorithms on Google Research Football (GRF) and the StarCraft Multi-Agent Challenge (SMAC). Particularly in SMAC tasks, GDT achieves a success rate of nearly 100% across almost all Hard Map and Super Hard Map scenarios. Additionally, we validate the effectiveness of our algorithm on Non-monotonic Matrix Games.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信