Dynamic Beam Pattern Based on Cooperation Multi-Agent VDN-D3QN for LEO Satellite Communication System

IF 6.7 2区计算机科学 Q1 TELECOMMUNICATIONS

IEEE Transactions on Green Communications and Networking Pub Date : 2024-09-10 DOI:10.1109/TGCN.2024.3457242

Meng Meng;Bo Hu;Shanzhi Chen;Shaoli Kang

{"title":"Dynamic Beam Pattern Based on Cooperation Multi-Agent VDN-D3QN for LEO Satellite Communication System","authors":"Meng Meng;Bo Hu;Shanzhi Chen;Shaoli Kang","doi":"10.1109/TGCN.2024.3457242","DOIUrl":null,"url":null,"abstract":"Due to the cooperative coverage characteristic of LEO satellites and non-uniform traffic demand of beam positions, allocating the limited beam and power resource to massive beam positions flexibly and effectively is a challenge in beam hopping LEO satellite communication system. The agents in existing beam hopping schemes, which rely on deep reinforcement learning, are limited to acquiring state information within the coverage area of LEO satellite. For this reason, we propose a cooperation multi-agent Value-Decomposition Networks with Dueling Double Deep Q-Learning Network (VDN-D3QN) framework to generate dynamic beam hopping pattern for assuring delay fairness and throughput among beam positions in LEO satellite communication system. The proposed VDN-D3QN dynamic beam hopping method is divided into training and test phase, where each agent is only responsible for the beam hopping pattern of one LEO satellite. During the train phase, the agents learn to cooperate with other agents to maximize the system throughput and minimize the delay fairness among beam positions by Dueling Double Deep Q-Learning Network. Then, the Value-Decomposition Networks is employed to learn the optimal policy in a centralized manner through interaction with the environment. In test phase, the trained agents are deployed to address the challenging problem of inter-satellite communication in a distributed manner, and one agent is deployed per LEO satellite. The trained agents can make decisions about the dynamic beam hopping pattern based on the available local state information in LEO satellite communication system. The evaluation results demonstrate that the proposed multi-agent VDN-D3QN algorithm can effectively handle the non-uniform traffic demand of multi-satellites simultaneously. Besides, the simulation results indicate that the proposed VDN-D3QN algorithm can allocate resource intelligently for adapting the requirements of beam positions and achieving better performance compared to the baselines.","PeriodicalId":13052,"journal":{"name":"IEEE Transactions on Green Communications and Networking","volume":"9 2","pages":"725-738"},"PeriodicalIF":6.7000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Green Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10674002/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the cooperative coverage characteristic of LEO satellites and non-uniform traffic demand of beam positions, allocating the limited beam and power resource to massive beam positions flexibly and effectively is a challenge in beam hopping LEO satellite communication system. The agents in existing beam hopping schemes, which rely on deep reinforcement learning, are limited to acquiring state information within the coverage area of LEO satellite. For this reason, we propose a cooperation multi-agent Value-Decomposition Networks with Dueling Double Deep Q-Learning Network (VDN-D3QN) framework to generate dynamic beam hopping pattern for assuring delay fairness and throughput among beam positions in LEO satellite communication system. The proposed VDN-D3QN dynamic beam hopping method is divided into training and test phase, where each agent is only responsible for the beam hopping pattern of one LEO satellite. During the train phase, the agents learn to cooperate with other agents to maximize the system throughput and minimize the delay fairness among beam positions by Dueling Double Deep Q-Learning Network. Then, the Value-Decomposition Networks is employed to learn the optimal policy in a centralized manner through interaction with the environment. In test phase, the trained agents are deployed to address the challenging problem of inter-satellite communication in a distributed manner, and one agent is deployed per LEO satellite. The trained agents can make decisions about the dynamic beam hopping pattern based on the available local state information in LEO satellite communication system. The evaluation results demonstrate that the proposed multi-agent VDN-D3QN algorithm can effectively handle the non-uniform traffic demand of multi-satellites simultaneously. Besides, the simulation results indicate that the proposed VDN-D3QN algorithm can allocate resource intelligently for adapting the requirements of beam positions and achieving better performance compared to the baselines.

查看原文本刊更多论文

基于协作多智能体VDN-D3QN的LEO卫星通信系统动态波束方向图

由于低轨卫星的协同覆盖特性和波束位置业务需求的不均匀性，如何将有限的波束和功率资源灵活有效地分配到海量波束位置是低轨卫星跳波束通信系统面临的挑战。现有波束跳变方案中的智能体依赖于深度强化学习，局限于获取LEO卫星覆盖范围内的状态信息。为此，我们提出了一种基于Dueling双深度q -学习网络（VDN-D3QN）框架的多智能体价值分解网络来生成动态波束跳图，以保证低轨卫星通信系统波束位置之间的延迟公平性和吞吐量。提出的VDN-D3QN动态波束跳变方法分为训练阶段和测试阶段，每个agent只负责一颗LEO卫星的波束跳变方向图。在训练阶段，智能体通过Dueling双深度Q-Learning网络学习如何与其他智能体合作，以最大化系统吞吐量和最小化波束位置之间的延迟公平性。然后，利用价值分解网络，通过与环境的交互，集中学习最优策略。在测试阶段，以分布式方式部署经过训练的智能体来解决具有挑战性的卫星间通信问题，并且每个LEO卫星部署一个智能体。在低轨道卫星通信系统中，训练后的智能体可以根据可用的局部状态信息对动态波束跳变方向进行决策。评估结果表明，所提出的多智能体VDN-D3QN算法能够有效地同时处理多颗卫星的非均匀业务需求。仿真结果表明，所提出的VDN-D3QN算法能够智能地分配资源，适应波束位置的要求，性能优于基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Green Communications and Networking Computer Science-Computer Networks and Communications

CiteScore

9.30

自引率

6.20%

发文量

181