基于DDPG算法的无人机空战自主机动决策

Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu
{"title":"基于DDPG算法的无人机空战自主机动决策","authors":"Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu","doi":"10.1109/ICCA.2019.8899703","DOIUrl":null,"url":null,"abstract":"Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.","PeriodicalId":130891,"journal":{"name":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"UAV Air Combat Autonomous Maneuver Decision Based on DDPG Algorithm\",\"authors\":\"Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu\",\"doi\":\"10.1109/ICCA.2019.8899703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.\",\"PeriodicalId\":130891,\"journal\":{\"name\":\"2019 IEEE 15th International Conference on Control and Automation (ICCA)\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 15th International Conference on Control and Automation (ICCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCA.2019.8899703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA.2019.8899703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

摘要

基于强化学习理论,建立了无人机自主空战机动决策的学习模型。针对传统的强化学习和DQN算法无法处理连续动作空间的问题,采用基于策略梯度的DDPG算法,使模型能够输出连续平滑的控制值,从而提高了无人机自主控制的精度。DDPG算法通过在动作值中加入噪声来探索动作空间。随机生成的初始动作值组合包含大量无效或低质量的个体,导致学习和定位效率低下。针对这一问题,本文提出利用优化算法生成空战机动动作值,并将优化动作作为初始样本添加到DDPG回放缓冲区中。该方法过滤掉了大量无效的动作值,保证了动作值的正确性。同时,保留了探索多样性的可能性,提高了DDPG算法的学习效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
UAV Air Combat Autonomous Maneuver Decision Based on DDPG Algorithm
Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信