Energy-efficient resource allocation based on deep reinforcement learning for massive MIMO-NOMA-SWIPT system

IF 3.2 3区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

Aeu-International Journal of Electronics and Communications Pub Date : 2025-07-22 DOI:10.1016/j.aeue.2025.155947

Yi Luo , Shaochen Zhang , Qintuya Si , Xin Zhao , Yang Liu , Tianshuang Qiu

{"title":"Energy-efficient resource allocation based on deep reinforcement learning for massive MIMO-NOMA-SWIPT system","authors":"Yi Luo , Shaochen Zhang , Qintuya Si , Xin Zhao , Yang Liu , Tianshuang Qiu","doi":"10.1016/j.aeue.2025.155947","DOIUrl":null,"url":null,"abstract":"<div><div>Massive multiple-input multiple-output (MIMO) and non-orthogonal multiple access (NOMA) can improve the energy efficiency (EE) of simultaneous wireless information and power transfer (SWIPT) systems by jointly optimizing the power allocation coefficient and the time slot handover coefficient. However, improper allocation of resources severely restricts the improvement of EE and substantially elevates communication costs of SWIPT. To address these issues, this study proposes a joint resource allocation algorithm for optimizing user scheduling, power allocation, and power splitting based on distributed multi-agent double deep Q-network and double multi-agent deep deterministic policy gradient (MADDQN-DMADDPG) network. Specifically, a double deep Q-network (DDQN) is used to optimize user scheduling by decoupling the action selection from action evaluation, which resolves the Q-value overestimation and mitigates multi-user interference. Then, to improve the training stability, a deep deterministic policy gradient (DDPG) is utilized to optimize the power allocation and power splitting by leveraging its ability to handle continuous action spaces. Moreover, we introduce the distributed multi-agent learning to bolster the learning capabilities of DDQN and DDPG, ensuring accuracy and efficiency of resource allocation. Simulations demonstrate that the proposed algorithm can significantly improve the overall EE of system with fast convergence speed.</div></div>","PeriodicalId":50844,"journal":{"name":"Aeu-International Journal of Electronics and Communications","volume":"201 ","pages":"Article 155947"},"PeriodicalIF":3.2000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aeu-International Journal of Electronics and Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1434841125002882","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Massive multiple-input multiple-output (MIMO) and non-orthogonal multiple access (NOMA) can improve the energy efficiency (EE) of simultaneous wireless information and power transfer (SWIPT) systems by jointly optimizing the power allocation coefficient and the time slot handover coefficient. However, improper allocation of resources severely restricts the improvement of EE and substantially elevates communication costs of SWIPT. To address these issues, this study proposes a joint resource allocation algorithm for optimizing user scheduling, power allocation, and power splitting based on distributed multi-agent double deep Q-network and double multi-agent deep deterministic policy gradient (MADDQN-DMADDPG) network. Specifically, a double deep Q-network (DDQN) is used to optimize user scheduling by decoupling the action selection from action evaluation, which resolves the Q-value overestimation and mitigates multi-user interference. Then, to improve the training stability, a deep deterministic policy gradient (DDPG) is utilized to optimize the power allocation and power splitting by leveraging its ability to handle continuous action spaces. Moreover, we introduce the distributed multi-agent learning to bolster the learning capabilities of DDQN and DDPG, ensuring accuracy and efficiency of resource allocation. Simulations demonstrate that the proposed algorithm can significantly improve the overall EE of system with fast convergence speed.

查看原文本刊更多论文

基于深度强化学习的大规模MIMO-NOMA-SWIPT系统节能资源分配

大规模多输入多输出（MIMO）和非正交多址（NOMA）可以通过共同优化功率分配系数和时隙切换系数来提高同步无线信息与电力传输（SWIPT）系统的能效（EE）。然而，资源配置不当严重制约了EE的提升，也大大提高了swift的通信成本。为了解决这些问题，本研究提出了一种基于分布式多智能体双深度q网络和双多智能体深度确定性策略梯度（MADDQN-DMADDPG）网络的联合资源分配算法，用于优化用户调度、功率分配和功率分割。具体而言，采用双深度q -网络（DDQN）将动作选择与动作评估解耦来优化用户调度，解决了q值高估问题，减轻了多用户干扰。然后，为了提高训练稳定性，利用深度确定性策略梯度（deep deterministic policy gradient， DDPG）处理连续动作空间的能力来优化权力分配和权力分割。此外，我们引入了分布式多智能体学习，增强了DDQN和DDPG的学习能力，保证了资源分配的准确性和效率。仿真结果表明，该算法能显著提高系统的整体EE，且收敛速度快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Aeu-International Journal of Electronics and Communications 工程技术-电信学

CiteScore

6.90

自引率

18.80%

发文量

292

审稿时长

4.9 months

期刊介绍： AEÜ is an international scientific journal which publishes both original works and invited tutorials. The journal''s scope covers all aspects of theory and design of circuits, systems and devices for electronics, signal processing, and communication, including: signal and system theory, digital signal processing network theory and circuit design information theory, communication theory and techniques, modulation, source and channel coding switching theory and techniques, communication protocols optical communications microwave theory and techniques, radar, sonar antennas, wave propagation AEÜ publishes full papers and letters with very short turn around time but a high standard review process. Review cycles are typically finished within twelve weeks by application of modern electronic communication facilities.