基于递归 DRL 的多波束卫星通信系统资源分配方法

IF 1.6 4区计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC

Chinese Journal of Electronics Pub Date : 2024-09-09 DOI:10.23919/cje.2022.00.135

Haowei Meng;Ning Xin;Hao Qin;Di Zhao

{"title":"基于递归 DRL 的多波束卫星通信系统资源分配方法","authors":"Haowei Meng;Ning Xin;Hao Qin;Di Zhao","doi":"10.23919/cje.2022.00.135","DOIUrl":null,"url":null,"abstract":"Optimization-based radio resource management (RRM) has shown significant performance gains on high-throughput satellites (HTSs). However, as the number of allocable on-board resources increases, traditional RRM is difficult to apply in real satellite systems due to its intense computational complexity. Deep reinforcement learning (DRL) is a promising solution for the resource allocation problem due to its model-free advantages. Nevertheless, the action space faced by DRL increases exponentially with the increase of communication scale, which leads to an excessive exploration cost of the algorithm. In this paper, we propose a recursive frequency resource allocation algorithm based on long-short term memory (LSTM) and proximal policy optimization (PPO), called PPO-RA-LOOP, where RA means resource allocation and LOOP means the algorithm outputs actions in a recursive manner. Specifically, the PPO algorithm uses LSTM network to recursively generate sub-actions about frequency resource allocation for each beam, which significantly cuts down the action space. In addition, the LSTM-based recursive architecture allows PPO to better allocate the next frequency resource by using the generated sub-actions information as a prior knowledge, which reduces the complexity of the neural network. The simulation results show that PPO-RA-LOOP achieved higher spectral efficiency and system satisfaction compared with other frequency allocation algorithms.","PeriodicalId":50701,"journal":{"name":"Chinese Journal of Electronics","volume":"33 5","pages":"1286-1295"},"PeriodicalIF":1.6000,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669755","citationCount":"0","resultStr":"{\"title\":\"A Recursive DRL-Based Resource Allocation Method for Multibeam Satellite Communication Systems\",\"authors\":\"Haowei Meng;Ning Xin;Hao Qin;Di Zhao\",\"doi\":\"10.23919/cje.2022.00.135\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Optimization-based radio resource management (RRM) has shown significant performance gains on high-throughput satellites (HTSs). However, as the number of allocable on-board resources increases, traditional RRM is difficult to apply in real satellite systems due to its intense computational complexity. Deep reinforcement learning (DRL) is a promising solution for the resource allocation problem due to its model-free advantages. Nevertheless, the action space faced by DRL increases exponentially with the increase of communication scale, which leads to an excessive exploration cost of the algorithm. In this paper, we propose a recursive frequency resource allocation algorithm based on long-short term memory (LSTM) and proximal policy optimization (PPO), called PPO-RA-LOOP, where RA means resource allocation and LOOP means the algorithm outputs actions in a recursive manner. Specifically, the PPO algorithm uses LSTM network to recursively generate sub-actions about frequency resource allocation for each beam, which significantly cuts down the action space. In addition, the LSTM-based recursive architecture allows PPO to better allocate the next frequency resource by using the generated sub-actions information as a prior knowledge, which reduces the complexity of the neural network. The simulation results show that PPO-RA-LOOP achieved higher spectral efficiency and system satisfaction compared with other frequency allocation algorithms.\",\"PeriodicalId\":50701,\"journal\":{\"name\":\"Chinese Journal of Electronics\",\"volume\":\"33 5\",\"pages\":\"1286-1295\"},\"PeriodicalIF\":1.6000,\"publicationDate\":\"2024-09-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10669755\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chinese Journal of Electronics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10669755/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Journal of Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10669755/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

基于优化的无线电资源管理（RRM）已在高通量卫星（HTS）上显示出显著的性能提升。然而，随着可分配星载资源数量的增加，传统的 RRM 因其计算复杂度高而难以在实际卫星系统中应用。深度强化学习（DRL）因其无模型的优势而成为资源分配问题的一种有前途的解决方案。然而，随着通信规模的扩大，DRL 面临的行动空间呈指数级增长，导致算法的探索成本过高。本文提出了一种基于长短期记忆（LSTM）和近端策略优化（PPO）的递归频率资源分配算法，称为 PPO-RA-LOOP，其中 RA 表示资源分配，LOOP 表示算法以递归方式输出动作。具体来说，PPO 算法使用 LSTM 网络递归生成每个波束的频率资源分配子操作，从而大大缩小了操作空间。此外，基于 LSTM 的递归结构允许 PPO 将生成的子动作信息作为先验知识，从而更好地分配下一个频率资源，这降低了神经网络的复杂性。仿真结果表明，与其他频率分配算法相比，PPO-RA-LOOP 实现了更高的频谱效率和系统满意度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Recursive DRL-Based Resource Allocation Method for Multibeam Satellite Communication Systems

Optimization-based radio resource management (RRM) has shown significant performance gains on high-throughput satellites (HTSs). However, as the number of allocable on-board resources increases, traditional RRM is difficult to apply in real satellite systems due to its intense computational complexity. Deep reinforcement learning (DRL) is a promising solution for the resource allocation problem due to its model-free advantages. Nevertheless, the action space faced by DRL increases exponentially with the increase of communication scale, which leads to an excessive exploration cost of the algorithm. In this paper, we propose a recursive frequency resource allocation algorithm based on long-short term memory (LSTM) and proximal policy optimization (PPO), called PPO-RA-LOOP, where RA means resource allocation and LOOP means the algorithm outputs actions in a recursive manner. Specifically, the PPO algorithm uses LSTM network to recursively generate sub-actions about frequency resource allocation for each beam, which significantly cuts down the action space. In addition, the LSTM-based recursive architecture allows PPO to better allocate the next frequency resource by using the generated sub-actions information as a prior knowledge, which reduces the complexity of the neural network. The simulation results show that PPO-RA-LOOP achieved higher spectral efficiency and system satisfaction compared with other frequency allocation algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chinese Journal of Electronics 工程技术-工程：电子与电气

CiteScore

3.70

自引率

16.70%

发文量

342

审稿时长

12.0 months

期刊介绍： CJE focuses on the emerging fields of electronics, publishing innovative and transformative research papers. Most of the papers published in CJE are from universities and research institutes, presenting their innovative research results. Both theoretical and practical contributions are encouraged, and original research papers reporting novel solutions to the hot topics in electronics are strongly recommended.