基于动态状态表示的迁移学习求解大规模多智能体任务

IF 2.1 4区计算机科学 Q2 Computer Science

International Journal of Advanced Robotic Systems Pub Date : 2023-03-01 DOI:10.1177/17298806231162440

Lintao Dou, Zhen Jia, Jian Huang

{"title":"基于动态状态表示的迁移学习求解大规模多智能体任务","authors":"Lintao Dou, Zhen Jia, Jian Huang","doi":"10.1177/17298806231162440","DOIUrl":null,"url":null,"abstract":"Many research results have emerged in the past decade regarding multi-agent reinforcement learning. These include the successful application of asynchronous advantage actor-critic, double deep Q-network and other algorithms in multi-agent environments, and the more representative multi-agent training method based on the classical centralized training distributed execution algorithm QMIX. However, in a large-scale multi-agent environment, training becomes a major challenge due to the exponential growth of the state-action space. In this article, we design a training scheme from small-scale multi-agent training to large-scale multi-agent training. We use the transfer learning method to enable the training of large-scale agents to use the knowledge accumulated by training small-scale agents. We achieve policy transfer between tasks with different numbers of agents by designing a new dynamic state representation network, which uses a self-attention mechanism to capture and represent the local observations of agents. The dynamic state representation network makes it possible to expand the policy model from a few agents (4 agents, 10 agents) task to large-scale agents (16 agents, 50 agents) task. Furthermore, we conducted experiments in the famous real-time strategy game Starcraft II and the multi-agent research platform MAgent. And also set unmanned aerial vehicles trajectory planning simulations. Experimental results show that our approach not only reduces the time consumption of a large number of agent training tasks but also improves the final training performance.","PeriodicalId":50343,"journal":{"name":"International Journal of Advanced Robotic Systems","volume":" ","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Solving large-scale multi-agent tasks via transfer learning with dynamic state representation\",\"authors\":\"Lintao Dou, Zhen Jia, Jian Huang\",\"doi\":\"10.1177/17298806231162440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many research results have emerged in the past decade regarding multi-agent reinforcement learning. These include the successful application of asynchronous advantage actor-critic, double deep Q-network and other algorithms in multi-agent environments, and the more representative multi-agent training method based on the classical centralized training distributed execution algorithm QMIX. However, in a large-scale multi-agent environment, training becomes a major challenge due to the exponential growth of the state-action space. In this article, we design a training scheme from small-scale multi-agent training to large-scale multi-agent training. We use the transfer learning method to enable the training of large-scale agents to use the knowledge accumulated by training small-scale agents. We achieve policy transfer between tasks with different numbers of agents by designing a new dynamic state representation network, which uses a self-attention mechanism to capture and represent the local observations of agents. The dynamic state representation network makes it possible to expand the policy model from a few agents (4 agents, 10 agents) task to large-scale agents (16 agents, 50 agents) task. Furthermore, we conducted experiments in the famous real-time strategy game Starcraft II and the multi-agent research platform MAgent. And also set unmanned aerial vehicles trajectory planning simulations. Experimental results show that our approach not only reduces the time consumption of a large number of agent training tasks but also improves the final training performance.\",\"PeriodicalId\":50343,\"journal\":{\"name\":\"International Journal of Advanced Robotic Systems\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Robotic Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/17298806231162440\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Robotic Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/17298806231162440","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 0

摘要

在过去的十年中，已经出现了许多关于多智能体强化学习的研究成果。其中包括异步优势actor-critic、双深度Q网络等算法在多智能体环境中的成功应用，以及基于经典集中式训练分布式执行算法QMIX的更具代表性的多智能体训练方法。然而，在大规模的多智能体环境中，由于状态-动作空间的指数增长，训练成为一个主要挑战。在本文中，我们设计了一个从小规模多智能体训练到大规模多智能体培训的训练方案。我们使用迁移学习方法使大规模代理的训练能够利用训练小规模代理所积累的知识。我们通过设计一个新的动态状态表示网络来实现具有不同数量代理的任务之间的策略转移，该网络使用自注意机制来捕获和表示代理的局部观察。动态状态表示网络可以将策略模型从几个代理（4个代理，10个代理）任务扩展到大规模代理（16个代理，50个代理）。此外，我们还在著名的实时战略游戏《星际争霸II》和多智能体研究平台MAgent中进行了实验。并设置了无人机轨迹规划仿真。实验结果表明，我们的方法不仅减少了大量agent训练任务的时间消耗，而且提高了最终的训练性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Solving large-scale multi-agent tasks via transfer learning with dynamic state representation

Many research results have emerged in the past decade regarding multi-agent reinforcement learning. These include the successful application of asynchronous advantage actor-critic, double deep Q-network and other algorithms in multi-agent environments, and the more representative multi-agent training method based on the classical centralized training distributed execution algorithm QMIX. However, in a large-scale multi-agent environment, training becomes a major challenge due to the exponential growth of the state-action space. In this article, we design a training scheme from small-scale multi-agent training to large-scale multi-agent training. We use the transfer learning method to enable the training of large-scale agents to use the knowledge accumulated by training small-scale agents. We achieve policy transfer between tasks with different numbers of agents by designing a new dynamic state representation network, which uses a self-attention mechanism to capture and represent the local observations of agents. The dynamic state representation network makes it possible to expand the policy model from a few agents (4 agents, 10 agents) task to large-scale agents (16 agents, 50 agents) task. Furthermore, we conducted experiments in the famous real-time strategy game Starcraft II and the multi-agent research platform MAgent. And also set unmanned aerial vehicles trajectory planning simulations. Experimental results show that our approach not only reduces the time consumption of a large number of agent training tasks but also improves the final training performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Advanced Robotic Systems ROBOTICS-

CiteScore

6.50

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： International Journal of Advanced Robotic Systems (IJARS) is a JCR ranked, peer-reviewed open access journal covering the full spectrum of robotics research. The journal is addressed to both practicing professionals and researchers in the field of robotics and its specialty areas. IJARS features fourteen topic areas each headed by a Topic Editor-in-Chief, integrating all aspects of research in robotics under the journal''s domain.