{"title":"多目标空间任务序列选择中奖励最大化的指针网络","authors":"Edward Tomanek-Volynets, Matteo Ceriotti","doi":"10.1016/j.asr.2025.04.045","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15<!--> <!-->% less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5<!--> <!-->% less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.</div></div>","PeriodicalId":50850,"journal":{"name":"Advances in Space Research","volume":"75 12","pages":"Pages 8687-8706"},"PeriodicalIF":2.8000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The pointer network for reward maximisation in multi-target space mission sequence selection\",\"authors\":\"Edward Tomanek-Volynets, Matteo Ceriotti\",\"doi\":\"10.1016/j.asr.2025.04.045\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15<!--> <!-->% less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5<!--> <!-->% less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.</div></div>\",\"PeriodicalId\":50850,\"journal\":{\"name\":\"Advances in Space Research\",\"volume\":\"75 12\",\"pages\":\"Pages 8687-8706\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Space Research\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0273117725003990\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Space Research","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0273117725003990","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
The pointer network for reward maximisation in multi-target space mission sequence selection
Multi-target space mission scenarios such as asteroid rendezvous, debris removal or satellite servicing, require targeting several orbits in a single mission, often to be selected among a large set, and therefore choosing optimal sequences of these orbits to be visited. This paper demonstrates a reinforcement-learning-based framework for selecting the sequence of targets to be visited in large-scale multi-target mission optimisation problems. The sequence selection is a NP-hard combinatorial optimisation problem. The proposed method builds upon a neural network architecture for combinatorial optimisation originally developed for Euclidean problems, to produce estimates of the optimal sequence of targets in very short amounts of time. The neural network is trained using a policy-gradient reinforcement-learning approach. Once training is complete, the network can be evaluated in two ways: one of these (greedy decoding) produces solutions on average 15 % less optimal than Ant Colony Optimisation (ACO); the other (stochastic search) is on average 5 % less optimal than ACO, using an iterative process that is slower than greedy decoding but still orders of magnitude faster than ACO. The quality of the network’s solutions is shown both averaged over large amounts of problems, and demonstrated more closely on a few specific instances.
期刊介绍:
The COSPAR publication Advances in Space Research (ASR) is an open journal covering all areas of space research including: space studies of the Earth''s surface, meteorology, climate, the Earth-Moon system, planets and small bodies of the solar system, upper atmospheres, ionospheres and magnetospheres of the Earth and planets including reference atmospheres, space plasmas in the solar system, astrophysics from space, materials sciences in space, fundamental physics in space, space debris, space weather, Earth observations of space phenomena, etc.
NB: Please note that manuscripts related to life sciences as related to space are no more accepted for submission to Advances in Space Research. Such manuscripts should now be submitted to the new COSPAR Journal Life Sciences in Space Research (LSSR).
All submissions are reviewed by two scientists in the field. COSPAR is an interdisciplinary scientific organization concerned with the progress of space research on an international scale. Operating under the rules of ICSU, COSPAR ignores political considerations and considers all questions solely from the scientific viewpoint.