Optimal Multi-impulse Linear Rendezvous via Reinforcement Learning

Space: Science & Technology Pub Date : 2023-01-01 DOI:10.34133/space.0047

Longwei Xu, Gang Zhang, Shi Qiu, Xibin Cao

引用次数: 0

Abstract

A reinforcement learning-based approach is proposed to design the multi-impulse rendezvous trajectories in linear relative motions. For the relative motion in elliptical orbits, the relative state propagation is obtained directly from the state transition matrix. This rendezvous problem is constructed as a Markov decision process that reflects the fuel consumption, the transfer time, the relative state, and the dynamical model. An actor–critic algorithm is used to train policy for generating rendezvous maneuvers. The results of the numerical optimization (e.g., differential evolution) are adopted as the expert data set to accelerate the training process. By deploying a policy network, the multi-impulse rendezvous trajectories can be obtained on board. Moreover, the proposed approach is also applied to generate a feasible solution for many impulses (e.g., 20 impulses), which can be used as an initial value for further optimization. The numerical examples with random initial states show that the proposed method is much faster and has slightly worse performance indexes when compared with the evolutionary algorithm.

查看原文本刊更多论文

基于强化学习的最优多脉冲线性交会

提出了一种基于强化学习的线性相对运动多脉冲交会轨迹设计方法。对于椭圆轨道上的相对运动，直接从状态转移矩阵得到相对状态传播。该交会问题是一个反映燃料消耗、传递时间、相对状态和动力学模型的马尔可夫决策过程。提出了一种actor-critic算法，用于训练生成交会机动的策略。将数值优化的结果(如差分进化)作为专家数据集，加速训练过程。通过部署策略网络，可以获得机载多脉冲交会轨迹。此外，该方法还可用于生成多个脉冲(如20个脉冲)的可行解，该解可作为进一步优化的初始值。随机初始状态的数值算例表明，与进化算法相比，该方法速度更快，性能指标略差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Space: Science & Technology

自引率

0.00%

发文量