基于强化学习的导弹诱饵机动目标末制导

IF 5.3 1区 工程技术 Q1 ENGINEERING, AEROSPACE
Tianbo DENG , Hao HUANG , Yangwang FANG , Jie YAN , Haoyu CHENG
{"title":"基于强化学习的导弹诱饵机动目标末制导","authors":"Tianbo DENG ,&nbsp;Hao HUANG ,&nbsp;Yangwang FANG ,&nbsp;Jie YAN ,&nbsp;Haoyu CHENG","doi":"10.1016/j.cja.2023.05.028","DOIUrl":null,"url":null,"abstract":"<div><p>In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.</p></div>","PeriodicalId":55631,"journal":{"name":"Chinese Journal of Aeronautics","volume":"36 12","pages":"Pages 309-324"},"PeriodicalIF":5.3000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1000936123001851/pdfft?md5=8565a8bf3a8c84420de0938202a89258&pid=1-s2.0-S1000936123001851-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys\",\"authors\":\"Tianbo DENG ,&nbsp;Hao HUANG ,&nbsp;Yangwang FANG ,&nbsp;Jie YAN ,&nbsp;Haoyu CHENG\",\"doi\":\"10.1016/j.cja.2023.05.028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.</p></div>\",\"PeriodicalId\":55631,\"journal\":{\"name\":\"Chinese Journal of Aeronautics\",\"volume\":\"36 12\",\"pages\":\"Pages 309-324\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1000936123001851/pdfft?md5=8565a8bf3a8c84420de0938202a89258&pid=1-s2.0-S1000936123001851-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Aeronautics\",\"FirstCategoryId\":\"1087\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1000936123001851\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, AEROSPACE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Aeronautics","FirstCategoryId":"1087","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1000936123001851","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种基于新型深度确定性策略梯度(DDPG)算法的导弹末端制导法则,用于拦截装有红外诱饵的机动目标。首先,针对导弹无法准确区分目标和诱饵的问题,采用能量中心法获得目标和诱饵的等效能量中心(称为虚拟目标),并建立导弹和虚拟诱饵模型。然后,提出了基于可信搜索策略的改进 DDPG 算法,大大提高了以往 DDPG 算法的列车效率。此外,结合已建立的模型、改进 DDPG 算法得到的网络和奖励函数,提出了一种智能导弹终端制导方案。具体而言,设计了一种启发式奖励函数,用于战斗场景下的训练和学习。最后,通过蒙特卡洛试验验证了所提制导法则的有效性和鲁棒性,并将所提方案与其他方法得到的仿真结果进行了比较,进一步证明了其优越性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Reinforcement learning-based missile terminal guidance of maneuvering targets with decoys

In this paper, a missile terminal guidance law based on a new Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to intercept a maneuvering target equipped with an infrared decoy. First, to deal with the issue that the missile cannot accurately distinguish the target from the decoy, the energy center method is employed to obtain the equivalent energy center (called virtual target) of the target and decoy, and the model for the missile and the virtual decoy is established. Then, an improved DDPG algorithm is proposed based on a trusted-search strategy, which significantly increases the train efficiency of the previous DDPG algorithm. Furthermore, combining the established model, the network obtained by the improved DDPG algorithm and the reward function, an intelligent missile terminal guidance scheme is proposed. Specifically, a heuristic reward function is designed for training and learning in combat scenarios. Finally, the effectiveness and robustness of the proposed guidance law are verified by Monte Carlo tests, and the simulation results obtained by the proposed scheme and other methods are compared to further demonstrate its superior performance.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Chinese Journal of Aeronautics
Chinese Journal of Aeronautics 工程技术-工程:宇航
CiteScore
10.00
自引率
17.50%
发文量
3080
审稿时长
55 days
期刊介绍: Chinese Journal of Aeronautics (CJA) is an open access, peer-reviewed international journal covering all aspects of aerospace engineering. The Journal reports the scientific and technological achievements and frontiers in aeronautic engineering and astronautic engineering, in both theory and practice, such as theoretical research articles, experiment ones, research notes, comprehensive reviews, technological briefs and other reports on the latest developments and everything related to the fields of aeronautics and astronautics, as well as those ground equipment concerned.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信