多目标空间任务序列选择中奖励最大化的指针网络

IF 2.8 3区地球科学 Q2 ASTRONOMY & ASTROPHYSICS

Advances in Space Research Pub Date : 2025-04-22 DOI:10.1016/j.asr.2025.04.045

Edward Tomanek-Volynets, Matteo Ceriotti

{"title":"多目标空间任务序列选择中奖励最大化的指针网络","authors":"Edward Tomanek-Volynets, Matteo Ceriotti","doi":"10.1016/j.asr.2025.04.045","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15% less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5% less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.</div></div>","PeriodicalId":50850,"journal":{"name":"Advances in Space Research","volume":"75 12","pages":"Pages 8687-8706"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The pointer network for reward maximisation in multi-target space mission sequence selection\",\"authors\":\"Edward Tomanek-Volynets, Matteo Ceriotti\",\"doi\":\"10.1016/j.asr.2025.04.045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15% less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5% less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.</div></div>\",\"PeriodicalId\":50850,\"journal\":{\"name\":\"Advances in Space Research\",\"volume\":\"75 12\",\"pages\":\"Pages 8687-8706\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Space Research\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0273117725003990\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Space Research","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0273117725003990","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}

引用次数: 0

摘要

多目标空间任务场景，如小行星交会、碎片清除或卫星维修，需要在一次任务中瞄准多个轨道，通常要从一个大的集合中选择，因此要选择这些轨道的最佳序列。提出了一种基于强化学习的框架，用于大规模多目标任务优化问题中待访问目标序列的选择。序列选择是一个NP-hard组合优化问题。提出的方法建立在最初为欧几里得问题开发的组合优化的神经网络体系结构上，在很短的时间内产生最优目标序列的估计。神经网络使用策略梯度强化学习方法进行训练。一旦训练完成，网络可以用两种方式进行评估：其中一种（贪婪解码）产生的解决方案比蚁群优化（ACO）平均低15%；另一种（随机搜索）平均比蚁群算法低5%，使用的迭代过程比贪婪解码慢，但仍然比蚁群算法快几个数量级。网络解决方案的质量既显示在大量问题上的平均质量，也显示在少数特定实例上的更密切的质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The pointer network for reward maximisation in multi-target space mission sequence selection

Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15 % less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5 % less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advances in Space Research 地学天文-地球科学综合

CiteScore

5.20

自引率

11.50%

发文量

800

审稿时长

5.8 months

期刊介绍： The COSPAR publication Advances in Space Research (ASR) is an open journal covering all areas of space research including: space studies of the Earth''s surface, meteorology, climate, the Earth-Moon system, planets and small bodies of the solar system, upper atmospheres, ionospheres and magnetospheres of the Earth and planets including reference atmospheres, space plasmas in the solar system, astrophysics from space, materials sciences in space, fundamental physics in space, space debris, space weather, Earth observations of space phenomena, etc. NB: Please note that manuscripts related to life sciences as related to space are no more accepted for submission to Advances in Space Research. Such manuscripts should now be submitted to the new COSPAR Journal Life Sciences in Space Research (LSSR). All submissions are reviewed by two scientists in the field. COSPAR is an interdisciplinary scientific organization concerned with the progress of space research on an international scale. Operating under the rules of ICSU, COSPAR ignores political considerations and considers all questions solely from the scientific viewpoint.