{"title":"基于深度Q网络的近距离空战智能机动决策研究","authors":"Tingyu Zhang, Chen Zheng, Mingwei Sun, Yongshuai Wang, Zengqiang Chen","doi":"10.1109/DDCLS58216.2023.10166948","DOIUrl":null,"url":null,"abstract":"For the Unmanned Combat Aerial Vehicle(UCAV)maneuvering decision in close air combat, the design of reinforcement learning(RL) reward function and the selection of hyperparameters are studied based on the deep Q network algorithm. Considering the angle, range, altitude, and speed factors, an auxiliary reward function is proposed to solve the sparse reward problem of RL. Meanwhile, aiming at the issue of hyperparameter selection in RL, the influence of learning rate, the number of network nodes, and layers on the decision-making system is explored, and a suitable range of parameters is given, which provides a reference for the subsequent research on parameter selection. In addition, the simulation results show that the trained agent can obtain the optimal maneuver strategy in different air combat situations, but it is sensitive to RL hyperparameters.","PeriodicalId":415532,"journal":{"name":"2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on Intelligent Maneuvering Decision in Close Air Combat Based on Deep Q Network\",\"authors\":\"Tingyu Zhang, Chen Zheng, Mingwei Sun, Yongshuai Wang, Zengqiang Chen\",\"doi\":\"10.1109/DDCLS58216.2023.10166948\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the Unmanned Combat Aerial Vehicle(UCAV)maneuvering decision in close air combat, the design of reinforcement learning(RL) reward function and the selection of hyperparameters are studied based on the deep Q network algorithm. Considering the angle, range, altitude, and speed factors, an auxiliary reward function is proposed to solve the sparse reward problem of RL. Meanwhile, aiming at the issue of hyperparameter selection in RL, the influence of learning rate, the number of network nodes, and layers on the decision-making system is explored, and a suitable range of parameters is given, which provides a reference for the subsequent research on parameter selection. In addition, the simulation results show that the trained agent can obtain the optimal maneuver strategy in different air combat situations, but it is sensitive to RL hyperparameters.\",\"PeriodicalId\":415532,\"journal\":{\"name\":\"2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS)\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DDCLS58216.2023.10166948\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DDCLS58216.2023.10166948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on Intelligent Maneuvering Decision in Close Air Combat Based on Deep Q Network
For the Unmanned Combat Aerial Vehicle(UCAV)maneuvering decision in close air combat, the design of reinforcement learning(RL) reward function and the selection of hyperparameters are studied based on the deep Q network algorithm. Considering the angle, range, altitude, and speed factors, an auxiliary reward function is proposed to solve the sparse reward problem of RL. Meanwhile, aiming at the issue of hyperparameter selection in RL, the influence of learning rate, the number of network nodes, and layers on the decision-making system is explored, and a suitable range of parameters is given, which provides a reference for the subsequent research on parameter selection. In addition, the simulation results show that the trained agent can obtain the optimal maneuver strategy in different air combat situations, but it is sensitive to RL hyperparameters.