Robust solar sail trajectories using proximal policy optimization

IF 3.1 2区物理与天体物理 Q1 ENGINEERING, AEROSPACE

Acta Astronautica Pub Date : 2024-11-09 DOI:10.1016/j.actaastro.2024.10.065

Christian Bianchi, Lorenzo Niccolai, Giovanni Mengali

{"title":"Robust solar sail trajectories using proximal policy optimization","authors":"Christian Bianchi, Lorenzo Niccolai, Giovanni Mengali","doi":"10.1016/j.actaastro.2024.10.065","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning is used to design minimum-time trajectories of solar sails subject to the typical sources of uncertainty associated with such a propulsion system, i.e., inaccurate knowledge of the sail’s optical properties and the presence of wrinkles on the sail membrane. A proximal policy optimization (PPO) algorithm is used to train the agent and derive the control policy that associates the optimal sail attitude with each dynamic state. First, the agent is trained assuming deterministic unperturbed dynamics, and the results are compared with optimal solutions found by an indirect optimization method, thus demonstrating the effectiveness of this approach. Next, two stochastic scenarios are analysed. In the first, the optical coefficients of the sail are assumed to be random variables with Gaussian distribution, which leads to random variations in the sail characteristic acceleration. In the second scenario, wrinkles on the sail membrane are taken into account, resulting in a misalignment of the thrust vector with respect to a perfectly smooth surface. Both phenomena are modelled based on experimental measurements available in the literature in order to perform realistic analyses. In the stochastic scenarios, Monte Carlo simulations are performed using the trained policies, demonstrating that the reinforcement learning approach is capable of finding near time-optimal solutions, while also being robust to the sources of uncertainty considered.</div></div>","PeriodicalId":44971,"journal":{"name":"Acta Astronautica","volume":"226 ","pages":"Pages 702-715"},"PeriodicalIF":3.1000,"publicationDate":"2024-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Acta Astronautica","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0094576524006398","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}

引用次数: 0

Abstract

Reinforcement learning is used to design minimum-time trajectories of solar sails subject to the typical sources of uncertainty associated with such a propulsion system, i.e., inaccurate knowledge of the sail’s optical properties and the presence of wrinkles on the sail membrane. A proximal policy optimization (PPO) algorithm is used to train the agent and derive the control policy that associates the optimal sail attitude with each dynamic state. First, the agent is trained assuming deterministic unperturbed dynamics, and the results are compared with optimal solutions found by an indirect optimization method, thus demonstrating the effectiveness of this approach. Next, two stochastic scenarios are analysed. In the first, the optical coefficients of the sail are assumed to be random variables with Gaussian distribution, which leads to random variations in the sail characteristic acceleration. In the second scenario, wrinkles on the sail membrane are taken into account, resulting in a misalignment of the thrust vector with respect to a perfectly smooth surface. Both phenomena are modelled based on experimental measurements available in the literature in order to perform realistic analyses. In the stochastic scenarios, Monte Carlo simulations are performed using the trained policies, demonstrating that the reinforcement learning approach is capable of finding near time-optimal solutions, while also being robust to the sources of uncertainty considered.

查看原文本刊更多论文

利用近端策略优化实现稳健的太阳帆轨迹

强化学习用于设计太阳帆的最短时间轨迹，但这种推进系统具有典型的不确定性来源，即对太阳帆光学特性的不准确了解以及帆膜上是否存在褶皱。我们使用近端策略优化（PPO）算法来训练代理，并推导出将最佳风帆姿态与每个动态状态相关联的控制策略。首先，假设确定性的无扰动动态对代理进行训练，并将结果与间接优化方法找到的最优解进行比较，从而证明这种方法的有效性。接下来，分析了两种随机情况。在第一种情况下，假设风帆的光学系数是高斯分布的随机变量，这会导致风帆特性加速度的随机变化。在第二种情况下，考虑到帆膜上的褶皱会导致推力矢量相对于完全光滑表面的偏差。这两种现象都是根据文献中的实验测量结果建模的，以便进行实际分析。在随机情况下，使用训练有素的策略进行蒙特卡罗模拟，证明强化学习方法能够找到接近时间最优的解决方案，同时对所考虑的不确定性来源具有鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Acta Astronautica 工程技术-工程：宇航

CiteScore

7.20

自引率

22.90%

发文量

599

审稿时长

53 days

期刊介绍： Acta Astronautica is sponsored by the International Academy of Astronautics. Content is based on original contributions in all fields of basic, engineering, life and social space sciences and of space technology related to: The peaceful scientific exploration of space, Its exploitation for human welfare and progress, Conception, design, development and operation of space-borne and Earth-based systems, In addition to regular issues, the journal publishes selected proceedings of the annual International Astronautical Congress (IAC), transactions of the IAA and special issues on topics of current interest, such as microgravity, space station technology, geostationary orbits, and space economics. Other subject areas include satellite technology, space transportation and communications, space energy, power and propulsion, astrodynamics, extraterrestrial intelligence and Earth observations.