Wang Zhao , Qinglong Zhang , Ye Zhang , Jingyu Wang
{"title":"Global optimization of interception guidance law for maneuvering target based on reward reconstruction","authors":"Wang Zhao , Qinglong Zhang , Ye Zhang , Jingyu Wang","doi":"10.1016/j.eswa.2025.127372","DOIUrl":null,"url":null,"abstract":"<div><div>To address the maneuvering target interception task under finite field-of-view (FOV), this paper proposes a novel Reward-Guided Efficient Global Policy Learning (RGEGPL) method based on dynamic reward restructuring. The method aims to ensure efficient deep reinforcement learning (DRL) training while achieving a globally optimal policy with high interception accuracy, low average energy consumption, and low total energy consumption. To enhance the efficiency of the DRL process, the paper introduces an action space design based on proportional navigation (PN) to prevent the agent from conducting entirely random exploration during the initial phase in an unknown environment. Additionally, a reward shaping module is employed, along with a rational parameter selection method. To address the spatiotemporal reward coupling problem caused by the introduction of process rewards in the reward shaping module, this paper proposes a Spatiotemporal Coupling Decoupling (SCD) module based on dynamic reward reconstruction. This module effectively resolves the spatiotemporal coupling issue, ensuring the efficiency of the learning process while allowing iterative policy optimization to converge to the globally optimal solution. Through comparative simulations and Monte Carlo (MC) experiments, the results demonstrate that the proposed method achieves a policy with more than three times the interception accuracy of classical methods, and both average and total energy consumption are optimized. Compared to state-of-the-art DRL methods, the interception accuracy improves by 11.91%, and average and total energy consumption increase by over 6.44%. Furthermore, under conditions involving target maneuvering changes and constraints on decision frequency, the proposed method still exhibits strong robustness and strategic advantages. The computational complexity of the trained policy during the stage of use is also more efficient compared to other methods. The proposed method demonstrates superior performance and adaptability across multiple DRL baseline algorithms. Even in resource-constrained environments with limited decision-making frequency, the trained model maintains an advantage over both classical methods and state-of-the-art DRL-based methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"279 ","pages":"Article 127372"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425009947","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
To address the maneuvering target interception task under finite field-of-view (FOV), this paper proposes a novel Reward-Guided Efficient Global Policy Learning (RGEGPL) method based on dynamic reward restructuring. The method aims to ensure efficient deep reinforcement learning (DRL) training while achieving a globally optimal policy with high interception accuracy, low average energy consumption, and low total energy consumption. To enhance the efficiency of the DRL process, the paper introduces an action space design based on proportional navigation (PN) to prevent the agent from conducting entirely random exploration during the initial phase in an unknown environment. Additionally, a reward shaping module is employed, along with a rational parameter selection method. To address the spatiotemporal reward coupling problem caused by the introduction of process rewards in the reward shaping module, this paper proposes a Spatiotemporal Coupling Decoupling (SCD) module based on dynamic reward reconstruction. This module effectively resolves the spatiotemporal coupling issue, ensuring the efficiency of the learning process while allowing iterative policy optimization to converge to the globally optimal solution. Through comparative simulations and Monte Carlo (MC) experiments, the results demonstrate that the proposed method achieves a policy with more than three times the interception accuracy of classical methods, and both average and total energy consumption are optimized. Compared to state-of-the-art DRL methods, the interception accuracy improves by 11.91%, and average and total energy consumption increase by over 6.44%. Furthermore, under conditions involving target maneuvering changes and constraints on decision frequency, the proposed method still exhibits strong robustness and strategic advantages. The computational complexity of the trained policy during the stage of use is also more efficient compared to other methods. The proposed method demonstrates superior performance and adaptability across multiple DRL baseline algorithms. Even in resource-constrained environments with limited decision-making frequency, the trained model maintains an advantage over both classical methods and state-of-the-art DRL-based methods.
期刊介绍:
Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.