Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu
{"title":"UAV Air Combat Autonomous Maneuver Decision Based on DDPG Algorithm","authors":"Qiming Yang, Yan Zhu, Jiandong Zhang, Shasha Qiao, Jieling Liu","doi":"10.1109/ICCA.2019.8899703","DOIUrl":null,"url":null,"abstract":"Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.","PeriodicalId":130891,"journal":{"name":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 15th International Conference on Control and Automation (ICCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCA.2019.8899703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41
Abstract
Based on the reinforcement learning theory, this paper establishes the learning model of the autonomous air combat maneuver decision of the UAV. Aiming at the problem that traditional reinforcement learning and DQN algorithm cannot deal with continuous action space, a policy gradient based DDPG algorithm is adopted, which enables the model to output continuous and smooth control values, thus improving the accuracy of autonomous control of UAV. The DDPG algorithm explores the action space by adding noise to the action values. The randomly generated initial action value combination contains a large number of invalid or low-quality individuals, which leads to inefficient learning and localization. Aiming at this problem, this paper proposes to use the optimization algorithm to generate the air combat maneuver action value, and add the optimization action as the initial sample to the DDPG replay buffer. This method filters out a large number of invalid action values and guarantees the correctness of the action value. At the same time, the possibility of exploring diversity is preserved, and the learning efficiency of the DDPG algorithm is improved.