Optimal Multi-impulse Linear Rendezvous via Reinforcement Learning

Longwei Xu, Gang Zhang, Shi Qiu, Xibin Cao
{"title":"Optimal Multi-impulse Linear Rendezvous via Reinforcement Learning","authors":"Longwei Xu, Gang Zhang, Shi Qiu, Xibin Cao","doi":"10.34133/space.0047","DOIUrl":null,"url":null,"abstract":"A reinforcement learning-based approach is proposed to design the multi-impulse rendezvous trajectories in linear relative motions. For the relative motion in elliptical orbits, the relative state propagation is obtained directly from the state transition matrix. This rendezvous problem is constructed as a Markov decision process that reflects the fuel consumption, the transfer time, the relative state, and the dynamical model. An actor–critic algorithm is used to train policy for generating rendezvous maneuvers. The results of the numerical optimization (e.g., differential evolution) are adopted as the expert data set to accelerate the training process. By deploying a policy network, the multi-impulse rendezvous trajectories can be obtained on board. Moreover, the proposed approach is also applied to generate a feasible solution for many impulses (e.g., 20 impulses), which can be used as an initial value for further optimization. The numerical examples with random initial states show that the proposed method is much faster and has slightly worse performance indexes when compared with the evolutionary algorithm.","PeriodicalId":136587,"journal":{"name":"Space: Science & Technology","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Space: Science & Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34133/space.0047","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A reinforcement learning-based approach is proposed to design the multi-impulse rendezvous trajectories in linear relative motions. For the relative motion in elliptical orbits, the relative state propagation is obtained directly from the state transition matrix. This rendezvous problem is constructed as a Markov decision process that reflects the fuel consumption, the transfer time, the relative state, and the dynamical model. An actor–critic algorithm is used to train policy for generating rendezvous maneuvers. The results of the numerical optimization (e.g., differential evolution) are adopted as the expert data set to accelerate the training process. By deploying a policy network, the multi-impulse rendezvous trajectories can be obtained on board. Moreover, the proposed approach is also applied to generate a feasible solution for many impulses (e.g., 20 impulses), which can be used as an initial value for further optimization. The numerical examples with random initial states show that the proposed method is much faster and has slightly worse performance indexes when compared with the evolutionary algorithm.
基于强化学习的最优多脉冲线性交会
提出了一种基于强化学习的线性相对运动多脉冲交会轨迹设计方法。对于椭圆轨道上的相对运动,直接从状态转移矩阵得到相对状态传播。该交会问题是一个反映燃料消耗、传递时间、相对状态和动力学模型的马尔可夫决策过程。提出了一种actor-critic算法,用于训练生成交会机动的策略。将数值优化的结果(如差分进化)作为专家数据集,加速训练过程。通过部署策略网络,可以获得机载多脉冲交会轨迹。此外,该方法还可用于生成多个脉冲(如20个脉冲)的可行解,该解可作为进一步优化的初始值。随机初始状态的数值算例表明,与进化算法相比,该方法速度更快,性能指标略差。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信