{"title":"基于深度强化学习的无人飞行器自主决策研究","authors":"L. Wang, Hongtao Wei","doi":"10.1109/ictc55111.2022.9778652","DOIUrl":null,"url":null,"abstract":"In order to improve the intelligence level of training opponents in UCAV air combat simulation and the realism and immersion of air combat simulation in 3D space, this paper proposes a deep reinforcement learning algorithm for UCAV autonomous control based on virtual reality technology. A combination of reinforcement learning and Unity3D is used to train UCAV agents to achieve air combat tasks in 3D virtual reality space, and imitation learning is added to improve the efficiency of policy generation. Multiple perceptrons are used to simplify the agent’s acquisition of environmental state data, and reward functions are designed by integrating UCAV angle, speed, and altitude considerations to visualize the entire 3D visualization process of reinforcement learning training UCAV agents to interact with the environment.","PeriodicalId":123022,"journal":{"name":"2022 3rd Information Communication Technologies Conference (ICTC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Research on Autonomous Decision-Making of UCAV Based on Deep Reinforcement Learning\",\"authors\":\"L. Wang, Hongtao Wei\",\"doi\":\"10.1109/ictc55111.2022.9778652\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to improve the intelligence level of training opponents in UCAV air combat simulation and the realism and immersion of air combat simulation in 3D space, this paper proposes a deep reinforcement learning algorithm for UCAV autonomous control based on virtual reality technology. A combination of reinforcement learning and Unity3D is used to train UCAV agents to achieve air combat tasks in 3D virtual reality space, and imitation learning is added to improve the efficiency of policy generation. Multiple perceptrons are used to simplify the agent’s acquisition of environmental state data, and reward functions are designed by integrating UCAV angle, speed, and altitude considerations to visualize the entire 3D visualization process of reinforcement learning training UCAV agents to interact with the environment.\",\"PeriodicalId\":123022,\"journal\":{\"name\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 3rd Information Communication Technologies Conference (ICTC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ictc55111.2022.9778652\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 3rd Information Communication Technologies Conference (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ictc55111.2022.9778652","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on Autonomous Decision-Making of UCAV Based on Deep Reinforcement Learning
In order to improve the intelligence level of training opponents in UCAV air combat simulation and the realism and immersion of air combat simulation in 3D space, this paper proposes a deep reinforcement learning algorithm for UCAV autonomous control based on virtual reality technology. A combination of reinforcement learning and Unity3D is used to train UCAV agents to achieve air combat tasks in 3D virtual reality space, and imitation learning is added to improve the efficiency of policy generation. Multiple perceptrons are used to simplify the agent’s acquisition of environmental state data, and reward functions are designed by integrating UCAV angle, speed, and altitude considerations to visualize the entire 3D visualization process of reinforcement learning training UCAV agents to interact with the environment.