A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles

The Aeronautical Journal (1968) Pub Date : 2023-02-08 DOI:10.1017/aer.2023.4

C. Bao, X. Zhou, P. Wang, R. He, G. Tang

{"title":"A deep reinforcement learning-based approach to onboard trajectory generation for hypersonic vehicles","authors":"C. Bao, X. Zhou, P. Wang, R. He, G. Tang","doi":"10.1017/aer.2023.4","DOIUrl":null,"url":null,"abstract":"\n An onboard three-dimensional (3D) trajectory generation approach based on the reinforcement learning (RL) algorithm and deep neural network (DNN) is proposed for hypersonic vehicles in glide phase. Multiple trajectory samples are generated offline through the convex optimisation method. The deep learning (DL) is employed to pre-train the DNN for initialising the actor network and accelerating the RL process. Based on the offline deep policy deterministic actor-critic algorithm, a flight target-oriented reward function with path constraints is designed. The actor network is optimised by the end-to-end RL and policy gradients of the critic network until the reward function converges to the maximum. The actor network is considered as the onboard trajectory generator to compute optimal control values online based on the real-time motion states. The simulation results show that the single-step online planning time meets the real-time requirements of onboard trajectory generation. The significant improvement in terminal accuracy of the online trajectory and the better generalisation under biased initial states for hypersonic vehicles in glide phase is observed.","PeriodicalId":22567,"journal":{"name":"The Aeronautical Journal (1968)","volume":"12 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Aeronautical Journal (1968)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1017/aer.2023.4","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

An onboard three-dimensional (3D) trajectory generation approach based on the reinforcement learning (RL) algorithm and deep neural network (DNN) is proposed for hypersonic vehicles in glide phase. Multiple trajectory samples are generated offline through the convex optimisation method. The deep learning (DL) is employed to pre-train the DNN for initialising the actor network and accelerating the RL process. Based on the offline deep policy deterministic actor-critic algorithm, a flight target-oriented reward function with path constraints is designed. The actor network is optimised by the end-to-end RL and policy gradients of the critic network until the reward function converges to the maximum. The actor network is considered as the onboard trajectory generator to compute optimal control values online based on the real-time motion states. The simulation results show that the single-step online planning time meets the real-time requirements of onboard trajectory generation. The significant improvement in terminal accuracy of the online trajectory and the better generalisation under biased initial states for hypersonic vehicles in glide phase is observed.

查看原文本刊更多论文

基于深度强化学习的高超声速飞行器机载轨迹生成方法

提出了一种基于强化学习(RL)算法和深度神经网络(DNN)的高超声速飞行器滑翔阶段机载三维轨迹生成方法。通过凸优化方法离线生成多个轨迹样本。采用深度学习(DL)对深度神经网络进行预训练，初始化行动者网络，加速强化学习过程。基于离线深度策略确定性行为者批评算法，设计了一个带路径约束的飞行目标导向奖励函数。行动者网络通过端到端强化学习和评论家网络的策略梯度进行优化，直到奖励函数收敛到最大值。行动者网络作为机载轨迹生成器，根据实时运动状态在线计算最优控制值。仿真结果表明，单步在线规划时间满足机载弹道生成的实时性要求。研究结果表明，该方法显著提高了高超声速飞行器滑翔段初始偏置状态下在线轨迹的末端精度和泛化性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The Aeronautical Journal (1968)

自引率

0.00%

发文量