零射击可扩展协作的异构多智能体强化学习

IF 5.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan
{"title":"零射击可扩展协作的异构多智能体强化学习","authors":"Xudong Guo,&nbsp;Daming Shi,&nbsp;Junjie Yu,&nbsp;Wenhui Fan","doi":"10.1016/j.neucom.2025.130716","DOIUrl":null,"url":null,"abstract":"<div><div>The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. To address this, we propose a novel MARL framework named <em>S</em>calable and <em>H</em>eterogeneous <em>P</em>roximal <em>P</em>olicy <em>O</em>ptimization <em>(SHPPO)</em>, integrating heterogeneity into parameter-shared PPO-based MARL networks. We first leverage a latent network to learn strategy patterns for each agent adaptively. Second, we introduce a heterogeneous layer to be inserted into decision-making networks, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity, allowing SHPPO to adapt effectively to varying scales. SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability, and offering insights into the learned latent variables’ impact on team performance by visualization.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130716"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Heterogeneous multi-agent reinforcement learning for zero-shot scalable collaboration\",\"authors\":\"Xudong Guo,&nbsp;Daming Shi,&nbsp;Junjie Yu,&nbsp;Wenhui Fan\",\"doi\":\"10.1016/j.neucom.2025.130716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. To address this, we propose a novel MARL framework named <em>S</em>calable and <em>H</em>eterogeneous <em>P</em>roximal <em>P</em>olicy <em>O</em>ptimization <em>(SHPPO)</em>, integrating heterogeneity into parameter-shared PPO-based MARL networks. We first leverage a latent network to learn strategy patterns for each agent adaptively. Second, we introduce a heterogeneous layer to be inserted into decision-making networks, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity, allowing SHPPO to adapt effectively to varying scales. SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability, and offering insights into the learned latent variables’ impact on team performance by visualization.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"648 \",\"pages\":\"Article 130716\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225013888\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225013888","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

多智能体强化学习(MARL)的出现正在显著改变自动驾驶汽车网络等各个领域。然而,现实世界的多智能体系统通常包含多个角色,并且这些系统的规模是动态波动的。因此,为了实现零射击可扩展协作,不同角色的策略可以根据规模灵活更新是至关重要的,这仍然是当前MARL框架面临的挑战。为了解决这个问题,我们提出了一个新的MARL框架,称为可扩展和异构近端策略优化(SHPPO),将异质性集成到参数共享的基于ppo的MARL网络中。我们首先利用潜在网络自适应地学习每个智能体的策略模式。其次,我们将一个异构层插入到决策网络中,其参数由学习到的潜在变量具体生成。我们的方法是可扩展的,因为除了异构层之外,所有参数都是共享的,并且获得了个体间和时间的异质性,使SHPPO能够有效地适应不同的尺度。SHPPO在经典的MARL环境(如星际争霸多代理挑战(SMAC)和谷歌研究足球(GRF))中表现出卓越的性能,展示了增强的零射击可扩展性,并通过可视化提供了对学习潜在变量对团队绩效影响的见解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Heterogeneous multi-agent reinforcement learning for zero-shot scalable collaboration
The emergence of multi-agent reinforcement learning (MARL) is significantly transforming various fields like autonomous vehicle networks. However, real-world multi-agent systems typically contain multiple roles, and the scale of these systems dynamically fluctuates. Consequently, in order to achieve zero-shot scalable collaboration, it is essential that strategies for different roles can be updated flexibly according to the scales, which is still a challenge for current MARL frameworks. To address this, we propose a novel MARL framework named Scalable and Heterogeneous Proximal Policy Optimization (SHPPO), integrating heterogeneity into parameter-shared PPO-based MARL networks. We first leverage a latent network to learn strategy patterns for each agent adaptively. Second, we introduce a heterogeneous layer to be inserted into decision-making networks, whose parameters are specifically generated by the learned latent variables. Our approach is scalable as all the parameters are shared except for the heterogeneous layer, and gains both inter-individual and temporal heterogeneity, allowing SHPPO to adapt effectively to varying scales. SHPPO exhibits superior performance in classic MARL environments like Starcraft Multi-Agent Challenge (SMAC) and Google Research Football (GRF), showcasing enhanced zero-shot scalability, and offering insights into the learned latent variables’ impact on team performance by visualization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信