Throughput Maximization With an AoI Constraint in Energy Harvesting D2D-Enabled Cellular Networks: An MSRA-TD3 Approach

IF 10.7 1区计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Wireless Communications Pub Date : 2024-12-11 DOI:10.1109/TWC.2024.3509475

Xiaoying Liu;Jiaxiang Xu;Kechen Zheng;Guanglin Zhang;Jia Liu;Norio Shiratori

{"title":"Throughput Maximization With an AoI Constraint in Energy Harvesting D2D-Enabled Cellular Networks: An MSRA-TD3 Approach","authors":"Xiaoying Liu;Jiaxiang Xu;Kechen Zheng;Guanglin Zhang;Jia Liu;Norio Shiratori","doi":"10.1109/TWC.2024.3509475","DOIUrl":null,"url":null,"abstract":"The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive applications, has been rarely explored. Considering above facts, we aim to develop an optimal mode selection and resource allocation (MSRA) policy that maximizes the long-term overall throughput of a time-varying dynamic EH-DCN, subject to an age of information (AoI) constraint. As the MSRA policy involves both continuous variables (i.e., bandwidth, power, and time allocations) and discrete variables (i.e., mode selection and channel allocation), the optimization problem is proved to be nonconvex and NP-hard. To solve the nonconvex NP-hard problem, we exploit a deep reinforcement learning (DRL) approach, called MSRA twin delayed deep deterministic policy gradient (MSRA-TD3). The MSRA-TD3 employs a double critic network structure to better fit the reward function, and could effectively mitigate the overestimation of Q-value in deep deterministic policy gradient (DDPG), which is a classical DRL algorithm. It is worth noting that in the design of the MSRA-TD3, we use the throughput of user equipments (UEs) at the previous time slot as a state to bypass the channel state information estimation resulting from the time-varying dynamic environment, and take the weights of throughput and AoI penalty into the reward function to evaluate two performance. Simulations demonstrate that the established MSRA-TD3 algorithm achieves better performance in terms of throughput and AoI than comparison DRL algorithms.","PeriodicalId":13431,"journal":{"name":"IEEE Transactions on Wireless Communications","volume":"24 2","pages":"1448-1466"},"PeriodicalIF":10.7000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Wireless Communications","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10791438/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive applications, has been rarely explored. Considering above facts, we aim to develop an optimal mode selection and resource allocation (MSRA) policy that maximizes the long-term overall throughput of a time-varying dynamic EH-DCN, subject to an age of information (AoI) constraint. As the MSRA policy involves both continuous variables (i.e., bandwidth, power, and time allocations) and discrete variables (i.e., mode selection and channel allocation), the optimization problem is proved to be nonconvex and NP-hard. To solve the nonconvex NP-hard problem, we exploit a deep reinforcement learning (DRL) approach, called MSRA twin delayed deep deterministic policy gradient (MSRA-TD3). The MSRA-TD3 employs a double critic network structure to better fit the reward function, and could effectively mitigate the overestimation of Q-value in deep deterministic policy gradient (DDPG), which is a classical DRL algorithm. It is worth noting that in the design of the MSRA-TD3, we use the throughput of user equipments (UEs) at the previous time slot as a state to bypass the channel state information estimation resulting from the time-varying dynamic environment, and take the weights of throughput and AoI penalty into the reward function to evaluate two performance. Simulations demonstrate that the established MSRA-TD3 algorithm achieves better performance in terms of throughput and AoI than comparison DRL algorithms.

查看原文本刊更多论文

能量收集支持d2d的蜂窝网络中AoI约束的吞吐量最大化：一种MSRA-TD3方法

能量收集支持d2d的蜂窝网络（EH-DCN）已经成为解决能源供应和频谱利用问题的一种有前途的方法。现有的研究大多集中在吞吐量上，而对时效性应用至关重要的信息新鲜度的研究却很少。考虑到上述事实，我们的目标是开发一种最优模式选择和资源分配（MSRA）策略，使时变动态EH-DCN在信息时代（AoI）约束下的长期总吞吐量最大化。由于MSRA策略涉及连续变量（即带宽、功率和时间分配）和离散变量（即模式选择和信道分配），因此证明了优化问题是非凸的和np困难的。为了解决非凸NP-hard问题，我们利用了一种深度强化学习（DRL）方法，称为MSRA双延迟深度确定性策略梯度（MSRA- td3）。MSRA-TD3采用双批评家网络结构来更好地拟合奖励函数，可以有效地缓解深度确定性策略梯度（DDPG）中q值的高估，这是一种经典的DRL算法。值得注意的是，在MSRA-TD3的设计中，我们使用用户设备（ue）在前一个时隙的吞吐量作为状态，绕过时变动态环境导致的信道状态信息估计，并将吞吐量和AoI惩罚的权重纳入奖励函数中，以评估两种性能。仿真结果表明，所建立的MSRA-TD3算法在吞吐量和AoI方面都优于比较DRL算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Wireless Communications 工程技术-电信学

CiteScore

18.60

自引率

10.60%

发文量

708

审稿时长

5.6 months

期刊介绍： The IEEE Transactions on Wireless Communications is a prestigious publication that showcases cutting-edge advancements in wireless communications. It welcomes both theoretical and practical contributions in various areas. The scope of the Transactions encompasses a wide range of topics, including modulation and coding, detection and estimation, propagation and channel characterization, and diversity techniques. The journal also emphasizes the physical and link layer communication aspects of network architectures and protocols. The journal is open to papers on specific topics or non-traditional topics related to specific application areas. This includes simulation tools and methodologies, orthogonal frequency division multiplexing, MIMO systems, and wireless over optical technologies. Overall, the IEEE Transactions on Wireless Communications serves as a platform for high-quality manuscripts that push the boundaries of wireless communications and contribute to advancements in the field.