针对速度优势机动目标的事件触发深度强化学习制导算法。

ISA transactions Pub Date : 2025-04-30 DOI:10.1016/j.isatra.2025.04.035

Xu Wang, Yifan Deng, Yuanli Cai, Haonan Jiang

{"title":"针对速度优势机动目标的事件触发深度强化学习制导算法。","authors":"Xu Wang, Yifan Deng, Yuanli Cai, Haonan Jiang","doi":"10.1016/j.isatra.2025.04.035","DOIUrl":null,"url":null,"abstract":"Hypersonic vehicle interception raises stringent and challenging requirements for traditional guidance laws in terms of speed and maneuverability advantage. Deep reinforcement learning algorithms provide potential solutions for intercepting maneuvering targets of speed advantage, but they are greatly hindered by policy training inefficiency. To address these limitations, we propose a novel event-triggered deep reinforcement learning (ETDRL) algorithm along with an event-triggered training and time-triggered execution (ETTE) framework. The ETTE framework reformulates the agent-environment interaction as an event-triggered Markov decision process (ETMDP) model, where the agent updates its action only when the environment state meets a specific event triggering condition, otherwise maintaining the previous behavior between events. As a result, this approach significantly accelerates policy training by reducing the total number of decision steps required during the learning phase. To mitigate the potential degradation in control performance caused by the event-triggered mechanism, the ETTE framework enables well-trained policies to be executed with a fixed decision interval, that is, in a time-triggered way. Based on the proposed method, an ETDRL guidance law is developed for intercepting maneuvering targets of speed advantage under constraints of limited maneuverability, large initial heading error, and bearings-only measurement. By following the design principle of nullifying the line-of-sight angular rate to establish a collision course with the target, we model the guidance problem as an ETMDP. The twin delayed deep deterministic policy gradient algorithm is utilized to train the ETDRL guidance law. Numerical simulations demonstrate the superiority of the proposed ETDRL method over DRL algorithms in terms of policy training efficiency, while also highlighting its enhanced guidance performance over traditional methods.","PeriodicalId":94059,"journal":{"name":"ISA transactions","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An event-triggered deep reinforcement learning guidance algorithm for intercepting maneuvering target of speed advantage.\",\"authors\":\"Xu Wang, Yifan Deng, Yuanli Cai, Haonan Jiang\",\"doi\":\"10.1016/j.isatra.2025.04.035\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hypersonic vehicle interception raises stringent and challenging requirements for traditional guidance laws in terms of speed and maneuverability advantage. Deep reinforcement learning algorithms provide potential solutions for intercepting maneuvering targets of speed advantage, but they are greatly hindered by policy training inefficiency. To address these limitations, we propose a novel event-triggered deep reinforcement learning (ETDRL) algorithm along with an event-triggered training and time-triggered execution (ETTE) framework. The ETTE framework reformulates the agent-environment interaction as an event-triggered Markov decision process (ETMDP) model, where the agent updates its action only when the environment state meets a specific event triggering condition, otherwise maintaining the previous behavior between events. As a result, this approach significantly accelerates policy training by reducing the total number of decision steps required during the learning phase. To mitigate the potential degradation in control performance caused by the event-triggered mechanism, the ETTE framework enables well-trained policies to be executed with a fixed decision interval, that is, in a time-triggered way. Based on the proposed method, an ETDRL guidance law is developed for intercepting maneuvering targets of speed advantage under constraints of limited maneuverability, large initial heading error, and bearings-only measurement. By following the design principle of nullifying the line-of-sight angular rate to establish a collision course with the target, we model the guidance problem as an ETMDP. The twin delayed deep deterministic policy gradient algorithm is utilized to train the ETDRL guidance law. Numerical simulations demonstrate the superiority of the proposed ETDRL method over DRL algorithms in terms of policy training efficiency, while also highlighting its enhanced guidance performance over traditional methods.\",\"PeriodicalId\":94059,\"journal\":{\"name\":\"ISA transactions\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ISA transactions\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1016/j.isatra.2025.04.035\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISA transactions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.isatra.2025.04.035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

高超声速飞行器拦截在速度和机动性优势方面对传统制导律提出了严格和具有挑战性的要求。深度强化学习算法为拦截具有速度优势的机动目标提供了潜在的解决方案，但由于策略训练效率低下而受到很大阻碍。为了解决这些限制，我们提出了一种新的事件触发深度强化学习（ETDRL）算法以及事件触发训练和时间触发执行（ETTE）框架。ETTE框架将agent-环境交互重新表述为事件触发马尔可夫决策过程（ETMDP）模型，其中agent仅在环境状态满足特定事件触发条件时更新其行为，否则在事件之间保持先前的行为。因此，这种方法通过减少学习阶段所需的决策步骤的总数，大大加快了策略培训的速度。为了减轻由事件触发机制引起的控制性能的潜在下降，ETTE框架使训练有素的策略能够以固定的决策间隔执行，即以时间触发的方式执行。在此基础上，提出了在机动能力有限、初始航向误差大、单方位测量约束下拦截速度优势机动目标的ETDRL制导律。根据消去视距角速率的设计原则建立与目标的碰撞过程，将制导问题建模为ETMDP。采用双延迟深度确定性策略梯度算法训练ETDRL制导律。数值模拟表明，ETDRL方法在策略训练效率方面优于DRL算法，同时也突出了其优于传统方法的制导性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An event-triggered deep reinforcement learning guidance algorithm for intercepting maneuvering target of speed advantage.

Hypersonic vehicle interception raises stringent and challenging requirements for traditional guidance laws in terms of speed and maneuverability advantage. Deep reinforcement learning algorithms provide potential solutions for intercepting maneuvering targets of speed advantage, but they are greatly hindered by policy training inefficiency. To address these limitations, we propose a novel event-triggered deep reinforcement learning (ETDRL) algorithm along with an event-triggered training and time-triggered execution (ETTE) framework. The ETTE framework reformulates the agent-environment interaction as an event-triggered Markov decision process (ETMDP) model, where the agent updates its action only when the environment state meets a specific event triggering condition, otherwise maintaining the previous behavior between events. As a result, this approach significantly accelerates policy training by reducing the total number of decision steps required during the learning phase. To mitigate the potential degradation in control performance caused by the event-triggered mechanism, the ETTE framework enables well-trained policies to be executed with a fixed decision interval, that is, in a time-triggered way. Based on the proposed method, an ETDRL guidance law is developed for intercepting maneuvering targets of speed advantage under constraints of limited maneuverability, large initial heading error, and bearings-only measurement. By following the design principle of nullifying the line-of-sight angular rate to establish a collision course with the target, we model the guidance problem as an ETMDP. The twin delayed deep deterministic policy gradient algorithm is utilized to train the ETDRL guidance law. Numerical simulations demonstrate the superiority of the proposed ETDRL method over DRL algorithms in terms of policy training efficiency, while also highlighting its enhanced guidance performance over traditional methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ISA transactions

自引率

0.00%

发文量