An optimal guidance law based on deep reinforcement learning for compensating the lag in time-varying slope path following

IF 5.8 1区工程技术 Q1 ENGINEERING, AEROSPACE

Aerospace Science and Technology Pub Date : 2025-09-22 DOI:10.1016/j.ast.2025.110948

Zibo Wang, Qidan Zhu, Tianrui Zhao, Lipeng Wang

{"title":"An optimal guidance law based on deep reinforcement learning for compensating the lag in time-varying slope path following","authors":"Zibo Wang, Qidan Zhu, Tianrui Zhao, Lipeng Wang","doi":"10.1016/j.ast.2025.110948","DOIUrl":null,"url":null,"abstract":"<div><div>During the carrier landing process, the carrier motion induces real-time variations in the desired slope path. Considering the inherent lag in aircraft position adjustments, this paper proposes an optimal guidance law based on deep reinforcement learning (DRL) to compensate for the lag. First, the carrier landing process is modeled as a Finite Markov Decision Process (FMDP), and a comprehensive DRL framework is developed. Second, a novel Soft Actor-Critic (LA-SAC) method enhanced with the Long Short-Term Memory (LSTM) network and the attention mechanism (AM) is introduced. The method extracts the deck motion features with the LSTM network and adjusts the weights of different state data with AM to improve learning efficiency. Additionally, a distributed neural network is designed to integrate deck motion prediction and compensation, avoiding the complexity of parameter tuning in conventional methods. LA-SAC leverages full-dimensional data to train the network and derive an optimal guidance law. Finally, the superiority of the proposed method has been verified in a semi-physical simulation platform. Compared to DRL baselines, LA-SAC achieves faster convergence and derives a superior guidance policy. Compared to conventional methods, the proposed method provides a more significant lead margin to reduce landing errors. Furthermore, the ablation experiments confirmed the effectiveness of the LSTM network and AM modules, and the real-time analysis validated the practicality of the LA-SAC algorithm in actual implementation.</div></div>","PeriodicalId":50955,"journal":{"name":"Aerospace Science and Technology","volume":"168 ","pages":"Article 110948"},"PeriodicalIF":5.8000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aerospace Science and Technology","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1270963825010120","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}

引用次数: 0

Abstract

During the carrier landing process, the carrier motion induces real-time variations in the desired slope path. Considering the inherent lag in aircraft position adjustments, this paper proposes an optimal guidance law based on deep reinforcement learning (DRL) to compensate for the lag. First, the carrier landing process is modeled as a Finite Markov Decision Process (FMDP), and a comprehensive DRL framework is developed. Second, a novel Soft Actor-Critic (LA-SAC) method enhanced with the Long Short-Term Memory (LSTM) network and the attention mechanism (AM) is introduced. The method extracts the deck motion features with the LSTM network and adjusts the weights of different state data with AM to improve learning efficiency. Additionally, a distributed neural network is designed to integrate deck motion prediction and compensation, avoiding the complexity of parameter tuning in conventional methods. LA-SAC leverages full-dimensional data to train the network and derive an optimal guidance law. Finally, the superiority of the proposed method has been verified in a semi-physical simulation platform. Compared to DRL baselines, LA-SAC achieves faster convergence and derives a superior guidance policy. Compared to conventional methods, the proposed method provides a more significant lead margin to reduce landing errors. Furthermore, the ablation experiments confirmed the effectiveness of the LSTM network and AM modules, and the real-time analysis validated the practicality of the LA-SAC algorithm in actual implementation.

查看原文本刊更多论文

一种基于深度强化学习的最优制导律用于补偿时变斜率路径跟踪的滞后

在载体着陆过程中，载体运动引起所需斜率路径的实时变化。针对飞机位置调整存在的固有滞后性，提出了一种基于深度强化学习（DRL）的最优制导律来补偿这种滞后性。首先，将航母着陆过程建模为有限马尔可夫决策过程（FMDP），并建立了一个全面的DRL框架。其次，提出了一种基于长短期记忆（LSTM）网络和注意机制的软行为者评价（LA-SAC）方法。该方法利用LSTM网络提取甲板运动特征，并利用AM调整不同状态数据的权值，提高学习效率。此外，还设计了一个分布式神经网络，将甲板运动预测和补偿集成在一起，避免了传统方法中参数调整的复杂性。LA-SAC利用全维数据来训练网络并推导出最优制导律。最后，在半物理仿真平台上验证了所提方法的优越性。与DRL基线相比，LA-SAC收敛速度更快，并推导出更优的引导策略。与传统方法相比，该方法提供了更大的超前裕度，以减少着陆误差。消融实验验证了LSTM网络和AM模块的有效性，实时分析验证了LA-SAC算法在实际实现中的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Aerospace Science and Technology 工程技术-工程：宇航

CiteScore

10.30

自引率

28.60%

发文量

654

审稿时长

54 days

期刊介绍： Aerospace Science and Technology publishes articles of outstanding scientific quality. Each article is reviewed by two referees. The journal welcomes papers from a wide range of countries. This journal publishes original papers, review articles and short communications related to all fields of aerospace research, fundamental and applied, potential applications of which are clearly related to: • The design and the manufacture of aircraft, helicopters, missiles, launchers and satellites • The control of their environment • The study of various systems they are involved in, as supports or as targets. Authors are invited to submit papers on new advances in the following topics to aerospace applications: • Fluid dynamics • Energetics and propulsion • Materials and structures • Flight mechanics • Navigation, guidance and control • Acoustics • Optics • Electromagnetism and radar • Signal and image processing • Information processing • Data fusion • Decision aid • Human behaviour • Robotics and intelligent systems • Complex system engineering. Etc.