Energy-efficient resource allocation based on deep reinforcement learning for massive MIMO-NOMA-SWIPT system

IF 3.2 3区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Yi Luo , Shaochen Zhang , Qintuya Si , Xin Zhao , Yang Liu , Tianshuang Qiu
{"title":"Energy-efficient resource allocation based on deep reinforcement learning for massive MIMO-NOMA-SWIPT system","authors":"Yi Luo ,&nbsp;Shaochen Zhang ,&nbsp;Qintuya Si ,&nbsp;Xin Zhao ,&nbsp;Yang Liu ,&nbsp;Tianshuang Qiu","doi":"10.1016/j.aeue.2025.155947","DOIUrl":null,"url":null,"abstract":"<div><div>Massive multiple-input multiple-output (MIMO) and non-orthogonal multiple access (NOMA) can improve the energy efficiency (EE) of simultaneous wireless information and power transfer (SWIPT) systems by jointly optimizing the power allocation coefficient and the time slot handover coefficient. However, improper allocation of resources severely restricts the improvement of EE and substantially elevates communication costs of SWIPT. To address these issues, this study proposes a joint resource allocation algorithm for optimizing user scheduling, power allocation, and power splitting based on distributed multi-agent double deep Q-network and double multi-agent deep deterministic policy gradient (MADDQN-DMADDPG) network. Specifically, a double deep Q-network (DDQN) is used to optimize user scheduling by decoupling the action selection from action evaluation, which resolves the Q-value overestimation and mitigates multi-user interference. Then, to improve the training stability, a deep deterministic policy gradient (DDPG) is utilized to optimize the power allocation and power splitting by leveraging its ability to handle continuous action spaces. Moreover, we introduce the distributed multi-agent learning to bolster the learning capabilities of DDQN and DDPG, ensuring accuracy and efficiency of resource allocation. Simulations demonstrate that the proposed algorithm can significantly improve the overall EE of system with fast convergence speed.</div></div>","PeriodicalId":50844,"journal":{"name":"Aeu-International Journal of Electronics and Communications","volume":"201 ","pages":"Article 155947"},"PeriodicalIF":3.2000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aeu-International Journal of Electronics and Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1434841125002882","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Massive multiple-input multiple-output (MIMO) and non-orthogonal multiple access (NOMA) can improve the energy efficiency (EE) of simultaneous wireless information and power transfer (SWIPT) systems by jointly optimizing the power allocation coefficient and the time slot handover coefficient. However, improper allocation of resources severely restricts the improvement of EE and substantially elevates communication costs of SWIPT. To address these issues, this study proposes a joint resource allocation algorithm for optimizing user scheduling, power allocation, and power splitting based on distributed multi-agent double deep Q-network and double multi-agent deep deterministic policy gradient (MADDQN-DMADDPG) network. Specifically, a double deep Q-network (DDQN) is used to optimize user scheduling by decoupling the action selection from action evaluation, which resolves the Q-value overestimation and mitigates multi-user interference. Then, to improve the training stability, a deep deterministic policy gradient (DDPG) is utilized to optimize the power allocation and power splitting by leveraging its ability to handle continuous action spaces. Moreover, we introduce the distributed multi-agent learning to bolster the learning capabilities of DDQN and DDPG, ensuring accuracy and efficiency of resource allocation. Simulations demonstrate that the proposed algorithm can significantly improve the overall EE of system with fast convergence speed.
基于深度强化学习的大规模MIMO-NOMA-SWIPT系统节能资源分配
大规模多输入多输出(MIMO)和非正交多址(NOMA)可以通过共同优化功率分配系数和时隙切换系数来提高同步无线信息与电力传输(SWIPT)系统的能效(EE)。然而,资源配置不当严重制约了EE的提升,也大大提高了swift的通信成本。为了解决这些问题,本研究提出了一种基于分布式多智能体双深度q网络和双多智能体深度确定性策略梯度(MADDQN-DMADDPG)网络的联合资源分配算法,用于优化用户调度、功率分配和功率分割。具体而言,采用双深度q -网络(DDQN)将动作选择与动作评估解耦来优化用户调度,解决了q值高估问题,减轻了多用户干扰。然后,为了提高训练稳定性,利用深度确定性策略梯度(deep deterministic policy gradient, DDPG)处理连续动作空间的能力来优化权力分配和权力分割。此外,我们引入了分布式多智能体学习,增强了DDQN和DDPG的学习能力,保证了资源分配的准确性和效率。仿真结果表明,该算法能显著提高系统的整体EE,且收敛速度快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.90
自引率
18.80%
发文量
292
审稿时长
4.9 months
期刊介绍: AEÜ is an international scientific journal which publishes both original works and invited tutorials. The journal''s scope covers all aspects of theory and design of circuits, systems and devices for electronics, signal processing, and communication, including: signal and system theory, digital signal processing network theory and circuit design information theory, communication theory and techniques, modulation, source and channel coding switching theory and techniques, communication protocols optical communications microwave theory and techniques, radar, sonar antennas, wave propagation AEÜ publishes full papers and letters with very short turn around time but a high standard review process. Review cycles are typically finished within twelve weeks by application of modern electronic communication facilities.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信