无人机群安全保障成群运动的数字孪生深度强化学习

IF 2.5 4区计算机科学 Q3 TELECOMMUNICATIONS

Transactions on Emerging Telecommunications Technologies Pub Date : 2024-11-06 DOI:10.1002/ett.70011

Zhilin Li, Lei Lei, Gaoqing Shen, Xiaochang Liu, Xiaojiao Liu

{"title":"无人机群安全保障成群运动的数字孪生深度强化学习","authors":"Zhilin Li, Lei Lei, Gaoqing Shen, Xiaochang Liu, Xiaojiao Liu","doi":"10.1002/ett.70011","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Multi-agent deep reinforcement learning (MADRL) has become a typical paradigm for the flocking motion of UAV swarm in dynamic, stochastic environments. However, sim-to-real problems, such as reality gap, training efficiency, and safety issues, restrict the application of MADRL in flocking motion scenarios. To address these problems, we first propose a digital twin (DT)-enabled training framework. With the assistance of high-fidelity digital twin simulation, effective policies can be efficiently trained. Based on the multi-agent proximal policy optimization (MAPPO) algorithm, we then design the learning approach for flocking motion with matching observation space, action space, and reward function. Afterward, we employ a distributed flocking center estimation algorithm based on position consensus. The estimated center is used as a policy input to improve the aggregation behavior. Moreover, we introduce a repulsion scheme, which applies an additional repulsion force to the action to prevent UAVs from colliding with neighbors and obstacles. Simulation results show that our method performs well in maintaining flocking formation and avoiding collisions, and has better decision-making ability in near-realistic environments.</p>\n </div>","PeriodicalId":23282,"journal":{"name":"Transactions on Emerging Telecommunications Technologies","volume":"35 11","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Digital Twin-Enabled Deep Reinforcement Learning for Safety-Guaranteed Flocking Motion of UAV Swarm\",\"authors\":\"Zhilin Li, Lei Lei, Gaoqing Shen, Xiaochang Liu, Xiaojiao Liu\",\"doi\":\"10.1002/ett.70011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Multi-agent deep reinforcement learning (MADRL) has become a typical paradigm for the flocking motion of UAV swarm in dynamic, stochastic environments. However, sim-to-real problems, such as reality gap, training efficiency, and safety issues, restrict the application of MADRL in flocking motion scenarios. To address these problems, we first propose a digital twin (DT)-enabled training framework. With the assistance of high-fidelity digital twin simulation, effective policies can be efficiently trained. Based on the multi-agent proximal policy optimization (MAPPO) algorithm, we then design the learning approach for flocking motion with matching observation space, action space, and reward function. Afterward, we employ a distributed flocking center estimation algorithm based on position consensus. The estimated center is used as a policy input to improve the aggregation behavior. Moreover, we introduce a repulsion scheme, which applies an additional repulsion force to the action to prevent UAVs from colliding with neighbors and obstacles. Simulation results show that our method performs well in maintaining flocking formation and avoiding collisions, and has better decision-making ability in near-realistic environments.</p>\\n </div>\",\"PeriodicalId\":23282,\"journal\":{\"name\":\"Transactions on Emerging Telecommunications Technologies\",\"volume\":\"35 11\",\"pages\":\"\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2024-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Transactions on Emerging Telecommunications Technologies\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/ett.70011\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transactions on Emerging Telecommunications Technologies","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ett.70011","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

多代理深度强化学习（MADRL）已成为无人机群在动态、随机环境中成群运动的典型范例。然而，模拟到现实的问题，如现实差距、训练效率和安全问题，限制了 MADRL 在成群运动场景中的应用。为了解决这些问题，我们首先提出了一个支持数字孪生（DT）的训练框架。在高保真数字孪生模拟的帮助下，可以高效地训练出有效的策略。在多代理近端策略优化（MAPPO）算法的基础上，我们设计了与观测空间、行动空间和奖励函数相匹配的植群运动学习方法。之后，我们采用了一种基于位置共识的分布式植群中心估计算法。估计出的中心作为策略输入，用于改善聚合行为。此外，我们还引入了一种斥力方案，为行动施加额外的斥力，以防止无人机与邻居和障碍物发生碰撞。仿真结果表明，我们的方法在保持蜂群队形和避免碰撞方面表现良好，在接近真实的环境中具有更好的决策能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Digital Twin-Enabled Deep Reinforcement Learning for Safety-Guaranteed Flocking Motion of UAV Swarm

查看原文本刊更多论文

Digital Twin-Enabled Deep Reinforcement Learning for Safety-Guaranteed Flocking Motion of UAV Swarm

Multi-agent deep reinforcement learning (MADRL) has become a typical paradigm for the flocking motion of UAV swarm in dynamic, stochastic environments. However, sim-to-real problems, such as reality gap, training efficiency, and safety issues, restrict the application of MADRL in flocking motion scenarios. To address these problems, we first propose a digital twin (DT)-enabled training framework. With the assistance of high-fidelity digital twin simulation, effective policies can be efficiently trained. Based on the multi-agent proximal policy optimization (MAPPO) algorithm, we then design the learning approach for flocking motion with matching observation space, action space, and reward function. Afterward, we employ a distributed flocking center estimation algorithm based on position consensus. The estimated center is used as a policy input to improve the aggregation behavior. Moreover, we introduce a repulsion scheme, which applies an additional repulsion force to the action to prevent UAVs from colliding with neighbors and obstacles. Simulation results show that our method performs well in maintaining flocking formation and avoiding collisions, and has better decision-making ability in near-realistic environments.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Transactions on Emerging Telecommunications Technologies TELECOMMUNICATIONS-

CiteScore

8.90

自引率

13.90%

发文量

249

期刊介绍： ransactions on Emerging Telecommunications Technologies (ETT), formerly known as European Transactions on Telecommunications (ETT), has the following aims: - to attract cutting-edge publications from leading researchers and research groups around the world - to become a highly cited source of timely research findings in emerging fields of telecommunications - to limit revision and publication cycles to a few months and thus significantly increase attractiveness to publish - to become the leading journal for publishing the latest developments in telecommunications