多目标空间任务序列选择中奖励最大化的指针网络

IF 2.8 3区 地球科学 Q2 ASTRONOMY & ASTROPHYSICS
Edward Tomanek-Volynets, Matteo Ceriotti
{"title":"多目标空间任务序列选择中奖励最大化的指针网络","authors":"Edward Tomanek-Volynets,&nbsp;Matteo Ceriotti","doi":"10.1016/j.asr.2025.04.045","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15<!--> <!-->% less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5<!--> <!-->% less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.</div></div>","PeriodicalId":50850,"journal":{"name":"Advances in Space Research","volume":"75 12","pages":"Pages 8687-8706"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The pointer network for reward maximisation in multi-target space mission sequence selection\",\"authors\":\"Edward Tomanek-Volynets,&nbsp;Matteo Ceriotti\",\"doi\":\"10.1016/j.asr.2025.04.045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15<!--> <!-->% less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5<!--> <!-->% less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.</div></div>\",\"PeriodicalId\":50850,\"journal\":{\"name\":\"Advances in Space Research\",\"volume\":\"75 12\",\"pages\":\"Pages 8687-8706\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Space Research\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0273117725003990\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Space Research","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0273117725003990","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
引用次数: 0

摘要

多目标空间任务场景,如小行星交会、碎片清除或卫星维修,需要在一次任务中瞄准多个轨道,通常要从一个大的集合中选择,因此要选择这些轨道的最佳序列。提出了一种基于强化学习的框架,用于大规模多目标任务优化问题中待访问目标序列的选择。序列选择是一个NP-hard组合优化问题。提出的方法建立在最初为欧几里得问题开发的组合优化的神经网络体系结构上,在很短的时间内产生最优目标序列的估计。神经网络使用策略梯度强化学习方法进行训练。一旦训练完成,网络可以用两种方式进行评估:其中一种(贪婪解码)产生的解决方案比蚁群优化(ACO)平均低15%;另一种(随机搜索)平均比蚁群算法低5%,使用的迭代过程比贪婪解码慢,但仍然比蚁群算法快几个数量级。网络解决方案的质量既显示在大量问题上的平均质量,也显示在少数特定实例上的更密切的质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
The pointer network for reward maximisation in multi-target space mission sequence selection
Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15 % less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5 % less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Advances in Space Research
Advances in Space Research 地学天文-地球科学综合
CiteScore
5.20
自引率
11.50%
发文量
800
审稿时长
5.8 months
期刊介绍: The COSPAR publication Advances in Space Research (ASR) is an open journal covering all areas of space research including: space studies of the Earth''s surface, meteorology, climate, the Earth-Moon system, planets and small bodies of the solar system, upper atmospheres, ionospheres and magnetospheres of the Earth and planets including reference atmospheres, space plasmas in the solar system, astrophysics from space, materials sciences in space, fundamental physics in space, space debris, space weather, Earth observations of space phenomena, etc. NB: Please note that manuscripts related to life sciences as related to space are no more accepted for submission to Advances in Space Research. Such manuscripts should now be submitted to the new COSPAR Journal Life Sciences in Space Research (LSSR). All submissions are reviewed by two scientists in the field. COSPAR is an interdisciplinary scientific organization concerned with the progress of space research on an international scale. Operating under the rules of ICSU, COSPAR ignores political considerations and considers all questions solely from the scientific viewpoint.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信