{"title":"基于递归 DRL 的多波束卫星通信系统资源分配方法","authors":"Haowei Meng;Ning Xin;Hao Qin;Di Zhao","doi":"10.23919/cje.2022.00.135","DOIUrl":null,"url":null,"abstract":"Optimization-based radio resource management (RRM) has shown significant performance gains on high-throughput satellites (HTSs). However, as the number of allocable on-board resources increases, traditional RRM is difficult to apply in real satellite systems due to its intense computational complexity. Deep reinforcement learning (DRL) is a promising solution for the resource allocation problem due to its model-free advantages. Nevertheless, the action space faced by DRL increases exponentially with the increase of communication scale, which leads to an excessive exploration cost of the algorithm. In this paper, we propose a recursive frequency resource allocation algorithm based on long-short term memory (LSTM) and proximal policy optimization (PPO), called PPO-RA-LOOP, where RA means resource allocation and LOOP means the algorithm outputs actions in a recursive manner. Specifically, the PPO algorithm uses LSTM network to recursively generate sub-actions about frequency resource allocation for each beam, which significantly cuts down the action space. In addition, the LSTM-based recursive architecture allows PPO to better allocate the next frequency resource by using the generated sub-actions information as a prior knowledge, which reduces the complexity of the neural network. The simulation results show that PPO-RA-LOOP achieved higher spectral efficiency and system satisfaction compared with other frequency allocation algorithms.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":"33 5","pages":"1286-1295"},"PeriodicalIF":1.6000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669755","citationCount":"0","resultStr":"{\"title\":\"A Recursive DRL-Based Resource Allocation Method for Multibeam Satellite Communication Systems\",\"authors\":\"Haowei Meng;Ning Xin;Hao Qin;Di Zhao\",\"doi\":\"10.23919/cje.2022.00.135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optimization-based radio resource management (RRM) has shown significant performance gains on high-throughput satellites (HTSs). However, as the number of allocable on-board resources increases, traditional RRM is difficult to apply in real satellite systems due to its intense computational complexity. Deep reinforcement learning (DRL) is a promising solution for the resource allocation problem due to its model-free advantages. Nevertheless, the action space faced by DRL increases exponentially with the increase of communication scale, which leads to an excessive exploration cost of the algorithm. In this paper, we propose a recursive frequency resource allocation algorithm based on long-short term memory (LSTM) and proximal policy optimization (PPO), called PPO-RA-LOOP, where RA means resource allocation and LOOP means the algorithm outputs actions in a recursive manner. Specifically, the PPO algorithm uses LSTM network to recursively generate sub-actions about frequency resource allocation for each beam, which significantly cuts down the action space. In addition, the LSTM-based recursive architecture allows PPO to better allocate the next frequency resource by using the generated sub-actions information as a prior knowledge, which reduces the complexity of the neural network. The simulation results show that PPO-RA-LOOP achieved higher spectral efficiency and system satisfaction compared with other frequency allocation algorithms.\",\"PeriodicalId\":50701,\"journal\":{\"name\":\"Chinese Journal of Electronics\",\"volume\":\"33 5\",\"pages\":\"1286-1295\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669755\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669755/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10669755/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
A Recursive DRL-Based Resource Allocation Method for Multibeam Satellite Communication Systems
Optimization-based radio resource management (RRM) has shown significant performance gains on high-throughput satellites (HTSs). However, as the number of allocable on-board resources increases, traditional RRM is difficult to apply in real satellite systems due to its intense computational complexity. Deep reinforcement learning (DRL) is a promising solution for the resource allocation problem due to its model-free advantages. Nevertheless, the action space faced by DRL increases exponentially with the increase of communication scale, which leads to an excessive exploration cost of the algorithm. In this paper, we propose a recursive frequency resource allocation algorithm based on long-short term memory (LSTM) and proximal policy optimization (PPO), called PPO-RA-LOOP, where RA means resource allocation and LOOP means the algorithm outputs actions in a recursive manner. Specifically, the PPO algorithm uses LSTM network to recursively generate sub-actions about frequency resource allocation for each beam, which significantly cuts down the action space. In addition, the LSTM-based recursive architecture allows PPO to better allocate the next frequency resource by using the generated sub-actions information as a prior knowledge, which reduces the complexity of the neural network. The simulation results show that PPO-RA-LOOP achieved higher spectral efficiency and system satisfaction compared with other frequency allocation algorithms.
期刊介绍:
CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.