{"title":"多约束条件下基于强化学习的卫星编队姿态控制","authors":"Yingkai Cai , Kay-Soon Low , Zhaokui Wang","doi":"10.1016/j.asr.2024.07.084","DOIUrl":null,"url":null,"abstract":"<div><div>As the complexity of space missions increases, the constraints on satellite attitude control become more stringent, particularly for satellites working in orbit formation. This paper introduces a novel method, based on the categorization and modeling of different constraints, for attitude control of satellite formations under multiple constraints. The method employs the Phased Priority Reinforcement Learning (PPRL) approach, which utilizes Deep Deterministic Policy Gradient (DDPG) technology. Considering the complexity of constraints and the challenge posed by the high control dimensionality due to multi-satellite coordination, the method addresses these challenges through a two-step training strategy. The first step addresses the multi-constraint issue for individual satellites and increases the priority of single-satellite training experience data in the experience replay buffer of the second step to enhance data utilization efficiency. To address the issue of reward sparsity in complex high-dimensional constraint models, a detailed reward mechanism is proposed, incorporating both local and global constraints into the reward function, thereby achieving both efficient and effective attitude control. This approach not only meets dynamic, state, and performance constraints but also demonstrates adaptability and robustness through numerical simulations. Compared to traditional methods, this approach achieves significant improvements in control performance and constraint satisfaction, offering a novel solution pathway for high-dimensional control problems in multi-constraint satellite formations.</div></div>","PeriodicalId":50850,"journal":{"name":"Advances in Space Research","volume":"74 11","pages":"Pages 5819-5836"},"PeriodicalIF":2.8000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reinforcement learning-based satellite formation attitude control under multi-constraint\",\"authors\":\"Yingkai Cai , Kay-Soon Low , Zhaokui Wang\",\"doi\":\"10.1016/j.asr.2024.07.084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>As the complexity of space missions increases, the constraints on satellite attitude control become more stringent, particularly for satellites working in orbit formation. This paper introduces a novel method, based on the categorization and modeling of different constraints, for attitude control of satellite formations under multiple constraints. The method employs the Phased Priority Reinforcement Learning (PPRL) approach, which utilizes Deep Deterministic Policy Gradient (DDPG) technology. Considering the complexity of constraints and the challenge posed by the high control dimensionality due to multi-satellite coordination, the method addresses these challenges through a two-step training strategy. The first step addresses the multi-constraint issue for individual satellites and increases the priority of single-satellite training experience data in the experience replay buffer of the second step to enhance data utilization efficiency. To address the issue of reward sparsity in complex high-dimensional constraint models, a detailed reward mechanism is proposed, incorporating both local and global constraints into the reward function, thereby achieving both efficient and effective attitude control. This approach not only meets dynamic, state, and performance constraints but also demonstrates adaptability and robustness through numerical simulations. Compared to traditional methods, this approach achieves significant improvements in control performance and constraint satisfaction, offering a novel solution pathway for high-dimensional control problems in multi-constraint satellite formations.</div></div>\",\"PeriodicalId\":50850,\"journal\":{\"name\":\"Advances in Space Research\",\"volume\":\"74 11\",\"pages\":\"Pages 5819-5836\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Space Research\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0273117724008032\",\"RegionNum\":3,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Space Research","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0273117724008032","RegionNum":3,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}
Reinforcement learning-based satellite formation attitude control under multi-constraint
As the complexity of space missions increases, the constraints on satellite attitude control become more stringent, particularly for satellites working in orbit formation. This paper introduces a novel method, based on the categorization and modeling of different constraints, for attitude control of satellite formations under multiple constraints. The method employs the Phased Priority Reinforcement Learning (PPRL) approach, which utilizes Deep Deterministic Policy Gradient (DDPG) technology. Considering the complexity of constraints and the challenge posed by the high control dimensionality due to multi-satellite coordination, the method addresses these challenges through a two-step training strategy. The first step addresses the multi-constraint issue for individual satellites and increases the priority of single-satellite training experience data in the experience replay buffer of the second step to enhance data utilization efficiency. To address the issue of reward sparsity in complex high-dimensional constraint models, a detailed reward mechanism is proposed, incorporating both local and global constraints into the reward function, thereby achieving both efficient and effective attitude control. This approach not only meets dynamic, state, and performance constraints but also demonstrates adaptability and robustness through numerical simulations. Compared to traditional methods, this approach achieves significant improvements in control performance and constraint satisfaction, offering a novel solution pathway for high-dimensional control problems in multi-constraint satellite formations.
期刊介绍:
The COSPAR publication Advances in Space Research (ASR) is an open journal covering all areas of space research including: space studies of the Earth''s surface, meteorology, climate, the Earth-Moon system, planets and small bodies of the solar system, upper atmospheres, ionospheres and magnetospheres of the Earth and planets including reference atmospheres, space plasmas in the solar system, astrophysics from space, materials sciences in space, fundamental physics in space, space debris, space weather, Earth observations of space phenomena, etc.
NB: Please note that manuscripts related to life sciences as related to space are no more accepted for submission to Advances in Space Research. Such manuscripts should now be submitted to the new COSPAR Journal Life Sciences in Space Research (LSSR).
All submissions are reviewed by two scientists in the field. COSPAR is an interdisciplinary scientific organization concerned with the progress of space research on an international scale. Operating under the rules of ICSU, COSPAR ignores political considerations and considers all questions solely from the scientific viewpoint.