Single-Agent Reinforcement Learning for Scalable Earth-Observing Satellite Constellation Operations

IF 1.9 4区工程技术 Q2 ENGINEERING, AEROSPACE

Journal of Spacecraft and Rockets Pub Date : 2023-11-02 DOI:10.2514/1.a35736

Adam Herrmann, Mark A. Stephenson, Hanspeter Schaub

{"title":"Single-Agent Reinforcement Learning for Scalable Earth-Observing Satellite Constellation Operations","authors":"Adam Herrmann, Mark A. Stephenson, Hanspeter Schaub","doi":"10.2514/1.a35736","DOIUrl":null,"url":null,"abstract":"This work explores single-agent reinforcement learning for the multi-satellite agile Earth-observing scheduling problem. The objective of the problem is to maximize the weighted sum of imaging targets collected and downlinked while avoiding resource constraint violations on board the spacecraft. To avoid the computational complexity associated with multi-agent deep reinforcement learning while creating a robust and scalable solution, a policy is trained in a single satellite environment. This policy is then deployed on board each satellite in a Walker-delta constellation. A global set of targets is distributed to each satellite based on target access. The satellites communicate with one another to determine whether an imaging target is imaged or downlinked. Free communication, line-of-sight communication, and no communication are explored to determine how the communication assumptions and constellation design impact performance. Free communication is shown to produce the best performance, and no communication is shown to produce the worst performance. Line-of-sight communication performance is shown to depend heavily on the design of the constellation and how frequently the satellites can communicate with one another. To explore how higher-level coordination can impact performance, a centralized mixed-integer programming optimization approach to global target distribution is explored and compared to a decentralized approach. A genetic algorithm is also implemented for comparison purposes, and the proposed method is shown to achieve higher reward on average at a fraction of the computational cost.","PeriodicalId":50048,"journal":{"name":"Journal of Spacecraft and Rockets","volume":"33 2","pages":"0"},"PeriodicalIF":1.9000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Spacecraft and Rockets","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2514/1.a35736","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, AEROSPACE","Score":null,"Total":0}

引用次数: 0

Abstract

This work explores single-agent reinforcement learning for the multi-satellite agile Earth-observing scheduling problem. The objective of the problem is to maximize the weighted sum of imaging targets collected and downlinked while avoiding resource constraint violations on board the spacecraft. To avoid the computational complexity associated with multi-agent deep reinforcement learning while creating a robust and scalable solution, a policy is trained in a single satellite environment. This policy is then deployed on board each satellite in a Walker-delta constellation. A global set of targets is distributed to each satellite based on target access. The satellites communicate with one another to determine whether an imaging target is imaged or downlinked. Free communication, line-of-sight communication, and no communication are explored to determine how the communication assumptions and constellation design impact performance. Free communication is shown to produce the best performance, and no communication is shown to produce the worst performance. Line-of-sight communication performance is shown to depend heavily on the design of the constellation and how frequently the satellites can communicate with one another. To explore how higher-level coordination can impact performance, a centralized mixed-integer programming optimization approach to global target distribution is explored and compared to a decentralized approach. A genetic algorithm is also implemented for comparison purposes, and the proposed method is shown to achieve higher reward on average at a fraction of the computational cost.

查看原文本刊更多论文

可扩展地球观测卫星星座操作的单智能体强化学习

本文探讨了多卫星敏捷地球观测调度问题的单智能体强化学习。该问题的目标是在避免违反星载资源约束的情况下，最大限度地获取和下行成像目标的加权和。为了避免与多智能体深度强化学习相关的计算复杂性，同时创建一个鲁棒和可扩展的解决方案，在单个卫星环境中训练策略。然后将该策略部署在沃克-三角洲星座的每颗卫星上。基于目标访问，将一组全局目标分配给每颗卫星。卫星之间相互通信，以确定一个成像目标是被成像还是被下行。探讨了自由通信、视距通信和无通信，以确定通信假设和星座设计如何影响性能。自由交流被证明能产生最好的表现，而不交流被证明会产生最差的表现。视距通信性能在很大程度上取决于星座的设计和卫星之间相互通信的频率。为了探索更高级别的协调如何影响性能，我们探索了一种用于全局目标分布的集中式混合整数规划优化方法，并将其与分散式方法进行了比较。为了进行比较，还实现了一种遗传算法，并且所提出的方法被证明在计算成本的一小部分上平均获得更高的回报。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Spacecraft and Rockets 工程技术-工程：宇航

CiteScore

3.60

自引率

18.80%

发文量

185

审稿时长

4.5 months

期刊介绍： This Journal, that started it all back in 1963, is devoted to the advancement of the science and technology of astronautics and aeronautics through the dissemination of original archival research papers disclosing new theoretical developments and/or experimental result. The topics include aeroacoustics, aerodynamics, combustion, fundamentals of propulsion, fluid mechanics and reacting flows, fundamental aspects of the aerospace environment, hydrodynamics, lasers and associated phenomena, plasmas, research instrumentation and facilities, structural mechanics and materials, optimization, and thermomechanics and thermochemistry. Papers also are sought which review in an intensive manner the results of recent research developments on any of the topics listed above.