基于DDPG算法的无人机空战自主机动决策

2019 IEEE 15th International Conference on Control and Automation (ICCA) Pub Date : 2019-07-01 DOI:10.1109/ICCA.2019.8899703

Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu

{"title":"基于DDPG算法的无人机空战自主机动决策","authors":"Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu","doi":"10.1109/ICCA.2019.8899703","DOIUrl":null,"url":null,"abstract":"Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.","PeriodicalId":130891,"journal":{"name":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"UAV Air Combat Autonomous Maneuver Decision Based on DDPG Algorithm\",\"authors\":\"Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu\",\"doi\":\"10.1109/ICCA.2019.8899703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.\",\"PeriodicalId\":130891,\"journal\":{\"name\":\"2019 IEEE 15th International Conference on Control and Automation (ICCA)\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 15th International Conference on Control and Automation (ICCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCA.2019.8899703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA.2019.8899703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 41

摘要

基于强化学习理论，建立了无人机自主空战机动决策的学习模型。针对传统的强化学习和DQN算法无法处理连续动作空间的问题，采用基于策略梯度的DDPG算法，使模型能够输出连续平滑的控制值，从而提高了无人机自主控制的精度。DDPG算法通过在动作值中加入噪声来探索动作空间。随机生成的初始动作值组合包含大量无效或低质量的个体，导致学习和定位效率低下。针对这一问题，本文提出利用优化算法生成空战机动动作值，并将优化动作作为初始样本添加到DDPG回放缓冲区中。该方法过滤掉了大量无效的动作值，保证了动作值的正确性。同时，保留了探索多样性的可能性，提高了DDPG算法的学习效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UAV Air Combat Autonomous Maneuver Decision Based on DDPG Algorithm

Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 15th International Conference on Control and Automation (ICCA)

自引率

0.00%

发文量