{"title":"Throughput Maximization With an AoI Constraint in Energy Harvesting D2D-Enabled Cellular Networks: An MSRA-TD3 Approach","authors":"Xiaoying Liu;Jiaxiang Xu;Kechen Zheng;Guanglin Zhang;Jia Liu;Norio Shiratori","doi":"10.1109/TWC.2024.3509475","DOIUrl":null,"url":null,"abstract":"The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive applications, has been rarely explored. Considering above facts, we aim to develop an optimal mode selection and resource allocation (MSRA) policy that maximizes the long-term overall throughput of a time-varying dynamic EH-DCN, subject to an age of information (AoI) constraint. As the MSRA policy involves both continuous variables (i.e., bandwidth, power, and time allocations) and discrete variables (i.e., mode selection and channel allocation), the optimization problem is proved to be nonconvex and NP-hard. To solve the nonconvex NP-hard problem, we exploit a deep reinforcement learning (DRL) approach, called MSRA twin delayed deep deterministic policy gradient (MSRA-TD3). The MSRA-TD3 employs a double critic network structure to better fit the reward function, and could effectively mitigate the overestimation of Q-value in deep deterministic policy gradient (DDPG), which is a classical DRL algorithm. It is worth noting that in the design of the MSRA-TD3, we use the throughput of user equipments (UEs) at the previous time slot as a state to bypass the channel state information estimation resulting from the time-varying dynamic environment, and take the weights of throughput and AoI penalty into the reward function to evaluate two performance. Simulations demonstrate that the established MSRA-TD3 algorithm achieves better performance in terms of throughput and AoI than comparison DRL algorithms.","PeriodicalId":13431,"journal":{"name":"IEEE Transactions on Wireless Communications","volume":"24 2","pages":"1448-1466"},"PeriodicalIF":10.7000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Wireless Communications","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10791438/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The energy harvesting D2D-enabled cellular network (EH-DCN) has emerged as a promising approach to address the issues of energy supply and spectrum utilization. Most of existing works mainly focus on the throughput, while the information freshness, which is critical to the time-sensitive applications, has been rarely explored. Considering above facts, we aim to develop an optimal mode selection and resource allocation (MSRA) policy that maximizes the long-term overall throughput of a time-varying dynamic EH-DCN, subject to an age of information (AoI) constraint. As the MSRA policy involves both continuous variables (i.e., bandwidth, power, and time allocations) and discrete variables (i.e., mode selection and channel allocation), the optimization problem is proved to be nonconvex and NP-hard. To solve the nonconvex NP-hard problem, we exploit a deep reinforcement learning (DRL) approach, called MSRA twin delayed deep deterministic policy gradient (MSRA-TD3). The MSRA-TD3 employs a double critic network structure to better fit the reward function, and could effectively mitigate the overestimation of Q-value in deep deterministic policy gradient (DDPG), which is a classical DRL algorithm. It is worth noting that in the design of the MSRA-TD3, we use the throughput of user equipments (UEs) at the previous time slot as a state to bypass the channel state information estimation resulting from the time-varying dynamic environment, and take the weights of throughput and AoI penalty into the reward function to evaluate two performance. Simulations demonstrate that the established MSRA-TD3 algorithm achieves better performance in terms of throughput and AoI than comparison DRL algorithms.
期刊介绍:
The IEEE Transactions on Wireless Communications is a prestigious publication that showcases cutting-edge advancements in wireless communications. It welcomes both theoretical and practical contributions in various areas. The scope of the Transactions encompasses a wide range of topics, including modulation and coding, detection and estimation, propagation and channel characterization, and diversity techniques. The journal also emphasizes the physical and link layer communication aspects of network architectures and protocols.
The journal is open to papers on specific topics or non-traditional topics related to specific application areas. This includes simulation tools and methodologies, orthogonal frequency division multiplexing, MIMO systems, and wireless over optical technologies.
Overall, the IEEE Transactions on Wireless Communications serves as a platform for high-quality manuscripts that push the boundaries of wireless communications and contribute to advancements in the field.