多任务环境中无人机群导航和避撞的高采样效率多代理强化学习

IF 8.2 1区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Internet of Things Journal Pub Date : 2024-11-07 DOI:10.1109/JIOT.2024.3409169

Jiaming Cheng;Ni Li;Ban Wang;Shuhui Bu;Ming Zhou

{"title":"多任务环境中无人机群导航和避撞的高采样效率多代理强化学习","authors":"Jiaming Cheng;Ni Li;Ban Wang;Shuhui Bu;Ming Zhou","doi":"10.1109/JIOT.2024.3409169","DOIUrl":null,"url":null,"abstract":"Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.","PeriodicalId":54347,"journal":{"name":"IEEE Internet of Things Journal","volume":null,"pages":null},"PeriodicalIF":8.2000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Sample-Efficient Multiagent Reinforcement Learning for Navigation and Collision Avoidance of UAV Swarms in Multitask Environments\",\"authors\":\"Jiaming Cheng;Ni Li;Ban Wang;Shuhui Bu;Ming Zhou\",\"doi\":\"10.1109/JIOT.2024.3409169\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.\",\"PeriodicalId\":54347,\"journal\":{\"name\":\"IEEE Internet of Things Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Internet of Things Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10747043/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Internet of Things Journal","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10747043/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

多代理强化学习（MARL）算法在无人机群等物联网设备中大有可为。然而，大规模蜂群系统具有动态性质，其代理数量和观察到的邻居不断变化，这给 MARL 适应性带来了挑战。现有方法难以提取有意义的特征，而且需要大量经验样本，导致样本效率低、风险率高。此外，这些方法在特定任务场景下有效，但在多任务场景下表现不佳。为了克服这些挑战，本研究为无人机群提出了一种高样本效率和可扩展的 MARL 方法。所提出的方法在策略网络的状态表示方面采用了基于超网络的嵌入注意（HEA）机制，在值函数方面采用了多编码器门控变换器和多层注意（MEGTrMA）机制。HEA 可自动为每个代理生成权重，以适应动态场景，从而增强表示能力和适应性，同时降低试错成本，提高学习效率。MEGTrMA 捕获了每个代理对全局观测的贡献，建立了代理之间的长期依赖关系，促进了多任务场景下的稳定策略学习。仿真结果表明，所提出的方法具有可扩展性、通用性和高样本效率。与从头开始学习相比，我们的方法通过逐步增加无人飞行器及其相应邻居的数量，将训练时间显著缩短到初始时间的五分之一以下。此外，大规模无人机群的平均碰撞次数减少了一个数量级。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-Sample-Efficient Multiagent Reinforcement Learning for Navigation and Collision Avoidance of UAV Swarms in Multitask Environments

Multiagent reinforcement learning (MARL) algorithms have shown promise in the Internet of Things devices, such as unmanned aerial vehicle (UAV) swarms. However, the dynamic nature of large-scale swarm systems, with constantly changing numbers of agents and observed neighbors, poses challenges for MARL adaptation. Existing approaches struggle to extract meaningful features and require a substantial number of experience samples, resulting in low-sample efficiency and high-risk ratios. Moreover, these methods are effective in task-specific scenarios and fail to perform well in multitask settings. To overcome these challenges, this study proposes a high-sample efficient and scalable MARL approach for UAV swarms. The proposed approach incorporates a hypernetwork-based embedding attention (HEA) mechanism for the state representation of the policy network and a multiencoder gated transformer with a multilayer attention (MEGTrMA) mechanism for the value function. The HEA automatically generates weights for each agent to adapt to dynamic scenarios, enhancing representation ability and adaptability while reducing the cost of trial and error for improved learning efficiency. The MEGTrMA captures the contribution of each agent to the global observation, establishing long-term dependencies among them and facilitating stable policy learning in multitask scenarios. Simulation results demonstrate that the proposed method is scalable, generalizable, and high-sample efficient. Compared to learning from scratch, our method significantly reduces training time to less than one-fifth of the initial time by progressively increasing the number of UAVs and their corresponding neighbors. Additionally, the average number of collisions is reduced by an order of magnitude for large-scale UAV swarms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Internet of Things Journal Computer Science-Information Systems

CiteScore

17.60

自引率

13.20%

发文量

1982

期刊介绍： The EEE Internet of Things (IoT) Journal publishes articles and review articles covering various aspects of IoT, including IoT system architecture, IoT enabling technologies, IoT communication and networking protocols such as network coding, and IoT services and applications. Topics encompass IoT's impacts on sensor technologies, big data management, and future internet design for applications like smart cities and smart homes. Fields of interest include IoT architecture such as things-centric, data-centric, service-oriented IoT architecture; IoT enabling technologies and systematic integration such as sensor technologies, big sensor data management, and future Internet design for IoT; IoT services, applications, and test-beds such as IoT service middleware, IoT application programming interface (API), IoT application design, and IoT trials/experiments; IoT standardization activities and technology development in different standard development organizations (SDO) such as IEEE, IETF, ITU, 3GPP, ETSI, etc.