Solving large-scale multi-agent tasks via transfer learning with dynamic state representation

IF 2.3 4区 计算机科学 Q2 Computer Science
Lintao Dou, Zhen Jia, Jian Huang
{"title":"Solving large-scale multi-agent tasks via transfer learning with dynamic state representation","authors":"Lintao Dou, Zhen Jia, Jian Huang","doi":"10.1177/17298806231162440","DOIUrl":null,"url":null,"abstract":"Many research results have emerged in the past decade regarding multi-agent reinforcement learning. These include the successful application of asynchronous advantage actor-critic, double deep Q-network and other algorithms in multi-agent environments, and the more representative multi-agent training method based on the classical centralized training distributed execution algorithm QMIX. However, in a large-scale multi-agent environment, training becomes a major challenge due to the exponential growth of the state-action space. In this article, we design a training scheme from small-scale multi-agent training to large-scale multi-agent training. We use the transfer learning method to enable the training of large-scale agents to use the knowledge accumulated by training small-scale agents. We achieve policy transfer between tasks with different numbers of agents by designing a new dynamic state representation network, which uses a self-attention mechanism to capture and represent the local observations of agents. The dynamic state representation network makes it possible to expand the policy model from a few agents (4 agents, 10 agents) task to large-scale agents (16 agents, 50 agents) task. Furthermore, we conducted experiments in the famous real-time strategy game Starcraft II and the multi-agent research platform MAgent. And also set unmanned aerial vehicles trajectory planning simulations. Experimental results show that our approach not only reduces the time consumption of a large number of agent training tasks but also improves the final training performance.","PeriodicalId":50343,"journal":{"name":"International Journal of Advanced Robotic Systems","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Robotic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/17298806231162440","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

Many research results have emerged in the past decade regarding multi-agent reinforcement learning. These include the successful application of asynchronous advantage actor-critic, double deep Q-network and other algorithms in multi-agent environments, and the more representative multi-agent training method based on the classical centralized training distributed execution algorithm QMIX. However, in a large-scale multi-agent environment, training becomes a major challenge due to the exponential growth of the state-action space. In this article, we design a training scheme from small-scale multi-agent training to large-scale multi-agent training. We use the transfer learning method to enable the training of large-scale agents to use the knowledge accumulated by training small-scale agents. We achieve policy transfer between tasks with different numbers of agents by designing a new dynamic state representation network, which uses a self-attention mechanism to capture and represent the local observations of agents. The dynamic state representation network makes it possible to expand the policy model from a few agents (4 agents, 10 agents) task to large-scale agents (16 agents, 50 agents) task. Furthermore, we conducted experiments in the famous real-time strategy game Starcraft II and the multi-agent research platform MAgent. And also set unmanned aerial vehicles trajectory planning simulations. Experimental results show that our approach not only reduces the time consumption of a large number of agent training tasks but also improves the final training performance.
基于动态状态表示的迁移学习求解大规模多智能体任务
在过去的十年中,已经出现了许多关于多智能体强化学习的研究成果。其中包括异步优势actor-critic、双深度Q网络等算法在多智能体环境中的成功应用,以及基于经典集中式训练分布式执行算法QMIX的更具代表性的多智能体训练方法。然而,在大规模的多智能体环境中,由于状态-动作空间的指数增长,训练成为一个主要挑战。在本文中,我们设计了一个从小规模多智能体训练到大规模多智能体培训的训练方案。我们使用迁移学习方法使大规模代理的训练能够利用训练小规模代理所积累的知识。我们通过设计一个新的动态状态表示网络来实现具有不同数量代理的任务之间的策略转移,该网络使用自注意机制来捕获和表示代理的局部观察。动态状态表示网络可以将策略模型从几个代理(4个代理,10个代理)任务扩展到大规模代理(16个代理,50个代理)。此外,我们还在著名的实时战略游戏《星际争霸II》和多智能体研究平台MAgent中进行了实验。并设置了无人机轨迹规划仿真。实验结果表明,我们的方法不仅减少了大量agent训练任务的时间消耗,而且提高了最终的训练性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.50
自引率
0.00%
发文量
65
审稿时长
6 months
期刊介绍: International Journal of Advanced Robotic Systems (IJARS) is a JCR ranked, peer-reviewed open access journal covering the full spectrum of robotics research. The journal is addressed to both practicing professionals and researchers in the field of robotics and its specialty areas. IJARS features fourteen topic areas each headed by a Topic Editor-in-Chief, integrating all aspects of research in robotics under the journal''s domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信