Y. Zou, Yutong Xie, Canhui Zhang, Shimin Gong, D. Hoang, D. Niyato
{"title":"Optimization-driven Hierarchical Deep Reinforcement Learning for Hybrid Relaying Communications","authors":"Y. Zou, Yutong Xie, Canhui Zhang, Shimin Gong, D. Hoang, D. Niyato","doi":"10.1109/WCNC45663.2020.9120470","DOIUrl":null,"url":null,"abstract":"In this paper, we employ multiple wireless-powered user devices as wireless relays to assist information transmission from a multi-antenna access point to a single-antenna receiver. To improve energy efficiency, we design a hybrid relaying communication strategy in which wireless relays are allowed to operate in either the passive mode via backscatter communications or the active mode via RF communications, depending on their channel conditions and energy states. We aim to maximize the overall SNR by jointly optimizing the access point’s beamforming strategy as well as individual relays’ radio modes and operating parameters. Due to the non-convex and combinatorial structure of the SNR maximization problem, we develop a deep reinforcement learning approach that adapts the beamforming and relaying strategies dynamically. In particular, we propose a novel optimization-driven hierarchical deep deterministic policy gradient (H-DDPG) approach that integrates the model-based optimization into the framework of conventional DDPG approach. It decomposes the discrete relay mode selection into the outer-loop by using deep Q-network (DQN) algorithm and then optimizes the continuous beamforming and relays’ operating parameters by using the inner-loop DDPG algorithm. Simulation results reveal that the H-DDPG is robust to the hyper parameters and can speed up the learning process compared to the conventional DDPG approach.","PeriodicalId":415064,"journal":{"name":"2020 IEEE Wireless Communications and Networking Conference (WCNC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Wireless Communications and Networking Conference (WCNC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCNC45663.2020.9120470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In this paper, we employ multiple wireless-powered user devices as wireless relays to assist information transmission from a multi-antenna access point to a single-antenna receiver. To improve energy efficiency, we design a hybrid relaying communication strategy in which wireless relays are allowed to operate in either the passive mode via backscatter communications or the active mode via RF communications, depending on their channel conditions and energy states. We aim to maximize the overall SNR by jointly optimizing the access point’s beamforming strategy as well as individual relays’ radio modes and operating parameters. Due to the non-convex and combinatorial structure of the SNR maximization problem, we develop a deep reinforcement learning approach that adapts the beamforming and relaying strategies dynamically. In particular, we propose a novel optimization-driven hierarchical deep deterministic policy gradient (H-DDPG) approach that integrates the model-based optimization into the framework of conventional DDPG approach. It decomposes the discrete relay mode selection into the outer-loop by using deep Q-network (DQN) algorithm and then optimizes the continuous beamforming and relays’ operating parameters by using the inner-loop DDPG algorithm. Simulation results reveal that the H-DDPG is robust to the hyper parameters and can speed up the learning process compared to the conventional DDPG approach.