Global optimization of interception guidance law for maneuvering target based on reward reconstruction

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Wang Zhao , Qinglong Zhang , Ye Zhang , Jingyu Wang
{"title":"Global optimization of interception guidance law for maneuvering target based on reward reconstruction","authors":"Wang Zhao ,&nbsp;Qinglong Zhang ,&nbsp;Ye Zhang ,&nbsp;Jingyu Wang","doi":"10.1016/j.eswa.2025.127372","DOIUrl":null,"url":null,"abstract":"<div><div>To address the maneuvering target interception task under finite field-of-view (FOV), this paper proposes a novel Reward-Guided Efficient Global Policy Learning (RGEGPL) method based on dynamic reward restructuring. The method aims to ensure efficient deep reinforcement learning (DRL) training while achieving a globally optimal policy with high interception accuracy, low average energy consumption, and low total energy consumption. To enhance the efficiency of the DRL process, the paper introduces an action space design based on proportional navigation (PN) to prevent the agent from conducting entirely random exploration during the initial phase in an unknown environment. Additionally, a reward shaping module is employed, along with a rational parameter selection method. To address the spatiotemporal reward coupling problem caused by the introduction of process rewards in the reward shaping module, this paper proposes a Spatiotemporal Coupling Decoupling (SCD) module based on dynamic reward reconstruction. This module effectively resolves the spatiotemporal coupling issue, ensuring the efficiency of the learning process while allowing iterative policy optimization to converge to the globally optimal solution. Through comparative simulations and Monte Carlo (MC) experiments, the results demonstrate that the proposed method achieves a policy with more than three times the interception accuracy of classical methods, and both average and total energy consumption are optimized. Compared to state-of-the-art DRL methods, the interception accuracy improves by 11.91%, and average and total energy consumption increase by over 6.44%. Furthermore, under conditions involving target maneuvering changes and constraints on decision frequency, the proposed method still exhibits strong robustness and strategic advantages. The computational complexity of the trained policy during the stage of use is also more efficient compared to other methods. The proposed method demonstrates superior performance and adaptability across multiple DRL baseline algorithms. Even in resource-constrained environments with limited decision-making frequency, the trained model maintains an advantage over both classical methods and state-of-the-art DRL-based methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127372"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425009947","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

To address the maneuvering target interception task under finite field-of-view (FOV), this paper proposes a novel Reward-Guided Efficient Global Policy Learning (RGEGPL) method based on dynamic reward restructuring. The method aims to ensure efficient deep reinforcement learning (DRL) training while achieving a globally optimal policy with high interception accuracy, low average energy consumption, and low total energy consumption. To enhance the efficiency of the DRL process, the paper introduces an action space design based on proportional navigation (PN) to prevent the agent from conducting entirely random exploration during the initial phase in an unknown environment. Additionally, a reward shaping module is employed, along with a rational parameter selection method. To address the spatiotemporal reward coupling problem caused by the introduction of process rewards in the reward shaping module, this paper proposes a Spatiotemporal Coupling Decoupling (SCD) module based on dynamic reward reconstruction. This module effectively resolves the spatiotemporal coupling issue, ensuring the efficiency of the learning process while allowing iterative policy optimization to converge to the globally optimal solution. Through comparative simulations and Monte Carlo (MC) experiments, the results demonstrate that the proposed method achieves a policy with more than three times the interception accuracy of classical methods, and both average and total energy consumption are optimized. Compared to state-of-the-art DRL methods, the interception accuracy improves by 11.91%, and average and total energy consumption increase by over 6.44%. Furthermore, under conditions involving target maneuvering changes and constraints on decision frequency, the proposed method still exhibits strong robustness and strategic advantages. The computational complexity of the trained policy during the stage of use is also more efficient compared to other methods. The proposed method demonstrates superior performance and adaptability across multiple DRL baseline algorithms. Even in resource-constrained environments with limited decision-making frequency, the trained model maintains an advantage over both classical methods and state-of-the-art DRL-based methods.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信