复杂环境下无人机集群控制的深度强化学习

Mahsoo Salimi, Philippe Pasquier
{"title":"复杂环境下无人机集群控制的深度强化学习","authors":"Mahsoo Salimi, Philippe Pasquier","doi":"10.1109/ICRAE53653.2021.9657767","DOIUrl":null,"url":null,"abstract":"Flocking formation of unmanned aerial vehicles (UAVs) is an open challenge due to kinematics complexity and uncertainties in complex environments. In this paper, the UAV flocking control problem is formulated as a partially observable Markov decision process (POMDP) and solved by deep reinforcing learning. In particular, we consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. The simulation results demonstrate that the trained optimal policy converges to flocking formation without parameter tuning and has good generalization ability for different UAVs.","PeriodicalId":338398,"journal":{"name":"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)","volume":"167 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Deep Reinforcement Learning for Flocking Control of UAVs in Complex Environments\",\"authors\":\"Mahsoo Salimi, Philippe Pasquier\",\"doi\":\"10.1109/ICRAE53653.2021.9657767\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Flocking formation of unmanned aerial vehicles (UAVs) is an open challenge due to kinematics complexity and uncertainties in complex environments. In this paper, the UAV flocking control problem is formulated as a partially observable Markov decision process (POMDP) and solved by deep reinforcing learning. In particular, we consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. The simulation results demonstrate that the trained optimal policy converges to flocking formation without parameter tuning and has good generalization ability for different UAVs.\",\"PeriodicalId\":338398,\"journal\":{\"name\":\"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)\",\"volume\":\"167 \",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICRAE53653.2021.9657767\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Conference on Robotics and Automation Engineering (ICRAE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRAE53653.2021.9657767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

由于无人机在复杂环境中运动的复杂性和不确定性,其编队是一个公开的挑战。本文将无人机群集控制问题表述为部分可观察马尔可夫决策过程(POMDP),并采用深度强化学习方法求解。特别是,我们考虑了一个领导者-追随者配置,其中所有无人机之间的共识被用来训练共享控制策略,每个无人机根据它收集的本地信息执行动作。此外,为了避免无人机之间的碰撞,保证无人机的群集和导航,增加了全局群集维护、相互奖励和碰撞惩罚的奖励函数。我们采用集中训练和分散执行的深度确定性策略梯度(deep deterministic policy gradient, DDPG),利用参与者-批评网络和全局状态空间矩阵获得群集控制策略。仿真结果表明,训练出的最优策略在不需要参数调整的情况下收敛到蜂群编队,对不同无人机具有良好的泛化能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Deep Reinforcement Learning for Flocking Control of UAVs in Complex Environments
Flocking formation of unmanned aerial vehicles (UAVs) is an open challenge due to kinematics complexity and uncertainties in complex environments. In this paper, the UAV flocking control problem is formulated as a partially observable Markov decision process (POMDP) and solved by deep reinforcing learning. In particular, we consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. The simulation results demonstrate that the trained optimal policy converges to flocking formation without parameter tuning and has good generalization ability for different UAVs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信