{"title":"基于分布式多智能体深度强化学习的低轨道大星座动态波束跳变优化","authors":"Kexin Chen, Xiaolu Liu, Wei Li","doi":"10.1016/j.swevo.2025.102039","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid advancement of low Earth orbit (LEO) satellite networks, conventional static beam allocation methods have become insufficient in addressing the challenges posed by dynamic traffic demands and uneven user distribution. To overcome these limitations, we propose a novel beam hopping scheduling approach specifically designed for dealing with uncertain channel conditions and time-varying traffic requirements in LEO satellite systems. We first develop a multi-objective optimization model that effectively balances the critical performance metrics of throughput and delay in LEO satellite networks. Building upon this foundation, we formulate a locally interacting Markov game model and rigorously prove the existence of at least one Nash equilibrium, thereby establishing a theoretical basis for our approach. To implement this model effectively, we introduce the Multi-Agent Deep Q-Network with Local Cooperative Rewards (MDQN-LCR) algorithm, which enables satellites to make intelligent decisions through a distributed Q-learning framework enhanced by a cooperative reward mechanism. Through extensive simulation experiments in diverse scenarios, results demonstrate that MDQN-LCR outperforms existing centralized methods by achieving 2.6% higher throughput in large-scale deployments and 13.5% lower transmission delay in short time slots. Our approach demonstrates superior stability with a confidence interval 42% smaller than that of centralized QMIX, while significantly reducing communication overhead through its distributed architecture. This makes our solution particularly suitable for large constellation scenarios , thus offering a practical and scalable alternative for next-generation satellite communication systems.</div></div>","PeriodicalId":48682,"journal":{"name":"Swarm and Evolutionary Computation","volume":"97 ","pages":"Article 102039"},"PeriodicalIF":8.5000,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A distributed multi-agent deep reinforcement learning approach for dynamic beam hopping optimization in LEO mega-constellations\",\"authors\":\"Kexin Chen, Xiaolu Liu, Wei Li\",\"doi\":\"10.1016/j.swevo.2025.102039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid advancement of low Earth orbit (LEO) satellite networks, conventional static beam allocation methods have become insufficient in addressing the challenges posed by dynamic traffic demands and uneven user distribution. To overcome these limitations, we propose a novel beam hopping scheduling approach specifically designed for dealing with uncertain channel conditions and time-varying traffic requirements in LEO satellite systems. We first develop a multi-objective optimization model that effectively balances the critical performance metrics of throughput and delay in LEO satellite networks. Building upon this foundation, we formulate a locally interacting Markov game model and rigorously prove the existence of at least one Nash equilibrium, thereby establishing a theoretical basis for our approach. To implement this model effectively, we introduce the Multi-Agent Deep Q-Network with Local Cooperative Rewards (MDQN-LCR) algorithm, which enables satellites to make intelligent decisions through a distributed Q-learning framework enhanced by a cooperative reward mechanism. Through extensive simulation experiments in diverse scenarios, results demonstrate that MDQN-LCR outperforms existing centralized methods by achieving 2.6% higher throughput in large-scale deployments and 13.5% lower transmission delay in short time slots. Our approach demonstrates superior stability with a confidence interval 42% smaller than that of centralized QMIX, while significantly reducing communication overhead through its distributed architecture. This makes our solution particularly suitable for large constellation scenarios , thus offering a practical and scalable alternative for next-generation satellite communication systems.</div></div>\",\"PeriodicalId\":48682,\"journal\":{\"name\":\"Swarm and Evolutionary Computation\",\"volume\":\"97 \",\"pages\":\"Article 102039\"},\"PeriodicalIF\":8.5000,\"publicationDate\":\"2025-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Swarm and Evolutionary Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S221065022500197X\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Swarm and Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S221065022500197X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
随着近地轨道卫星网络的快速发展,传统的静态波束分配方法已经无法满足动态业务需求和用户分布不均匀的挑战。为了克服这些限制,我们提出了一种新的跳波束调度方法,专门设计用于处理低轨道卫星系统中不确定信道条件和时变业务需求。我们首先建立了一个多目标优化模型,有效地平衡了低轨道卫星网络吞吐量和延迟的关键性能指标。在此基础上,我们建立了一个局部相互作用的马尔可夫博弈模型,并严格证明了至少一个纳什均衡的存在,从而为我们的方法奠定了理论基础。为了有效地实现该模型,我们引入了Multi-Agent Deep Q-Network with Local Cooperative Rewards (MDQN-LCR)算法,该算法通过协作奖励机制增强的分布式q学习框架,使卫星能够做出智能决策。通过在不同场景下的大量仿真实验,结果表明MDQN-LCR优于现有的集中式方法,在大规模部署中实现了2.6%的吞吐量提高,在短时隙中降低了13.5%的传输延迟。我们的方法表现出优越的稳定性,其置信区间比集中式QMIX小42%,同时通过其分布式架构显着降低了通信开销。这使得我们的解决方案特别适用于大型星座场景,从而为下一代卫星通信系统提供了实用且可扩展的替代方案。
A distributed multi-agent deep reinforcement learning approach for dynamic beam hopping optimization in LEO mega-constellations
With the rapid advancement of low Earth orbit (LEO) satellite networks, conventional static beam allocation methods have become insufficient in addressing the challenges posed by dynamic traffic demands and uneven user distribution. To overcome these limitations, we propose a novel beam hopping scheduling approach specifically designed for dealing with uncertain channel conditions and time-varying traffic requirements in LEO satellite systems. We first develop a multi-objective optimization model that effectively balances the critical performance metrics of throughput and delay in LEO satellite networks. Building upon this foundation, we formulate a locally interacting Markov game model and rigorously prove the existence of at least one Nash equilibrium, thereby establishing a theoretical basis for our approach. To implement this model effectively, we introduce the Multi-Agent Deep Q-Network with Local Cooperative Rewards (MDQN-LCR) algorithm, which enables satellites to make intelligent decisions through a distributed Q-learning framework enhanced by a cooperative reward mechanism. Through extensive simulation experiments in diverse scenarios, results demonstrate that MDQN-LCR outperforms existing centralized methods by achieving 2.6% higher throughput in large-scale deployments and 13.5% lower transmission delay in short time slots. Our approach demonstrates superior stability with a confidence interval 42% smaller than that of centralized QMIX, while significantly reducing communication overhead through its distributed architecture. This makes our solution particularly suitable for large constellation scenarios , thus offering a practical and scalable alternative for next-generation satellite communication systems.
期刊介绍:
Swarm and Evolutionary Computation is a pioneering peer-reviewed journal focused on the latest research and advancements in nature-inspired intelligent computation using swarm and evolutionary algorithms. It covers theoretical, experimental, and practical aspects of these paradigms and their hybrids, promoting interdisciplinary research. The journal prioritizes the publication of high-quality, original articles that push the boundaries of evolutionary computation and swarm intelligence. Additionally, it welcomes survey papers on current topics and novel applications. Topics of interest include but are not limited to: Genetic Algorithms, and Genetic Programming, Evolution Strategies, and Evolutionary Programming, Differential Evolution, Artificial Immune Systems, Particle Swarms, Ant Colony, Bacterial Foraging, Artificial Bees, Fireflies Algorithm, Harmony Search, Artificial Life, Digital Organisms, Estimation of Distribution Algorithms, Stochastic Diffusion Search, Quantum Computing, Nano Computing, Membrane Computing, Human-centric Computing, Hybridization of Algorithms, Memetic Computing, Autonomic Computing, Self-organizing systems, Combinatorial, Discrete, Binary, Constrained, Multi-objective, Multi-modal, Dynamic, and Large-scale Optimization.