{"title":"Relay Selection and Power Control for Mobile Underwater Acoustic Communication Networks: A Dual-Thread Reinforcement Learning Approach","authors":"Jun Dai;Xinbin Li;Song Han;Junzhi Yu;Zhixin Liu","doi":"10.1109/TGCN.2024.3445142","DOIUrl":null,"url":null,"abstract":"This paper deals with a cooperation communication problem (relay selection and power control) for mobile underwater acoustic communication networks. To achieve satisfactory transmission capacity, we propose a reinforcement-learning-based cooperation communication scheme to efficiently resist the highly dynamic communication links and strongly unknown time-varying channel states caused by the mobility of Autonomous Underwater Vehicles (AUVs). Firstly, a particular Markov decision process is developed to model the dynamic relay selection process of mobile AUV in the unknown scenario. In the developed model, an experimental statistical-based partition mechanism is proposed to cope with the greatly increasing dimension of the state space caused by the mobility of AUV, reducing the search optimization difficulty. Secondly, a dual-thread reinforcement learning structure with actual and virtual learning threads is proposed to efficiently track the superior relay action. In the actual learning thread, the proposed improved probability greedy policy enables the AUV to strengthen the exploration for the reward information of potential superior relays on the current state. Meanwhile, in the virtual learning thread, the proposed upper-confidence-bound-index-based uncertainty estimation method can estimate the action-reward level of historical states. Consequently, the combination of actual and virtual learning threads can efficiently obtain satisfactory Q value information, thereby making superior relay decision-making in a short time. Thirdly, a power control mechanism is proposed to reuse the current observed action-reward information and transform the multiple unknown parameter nonlinear joint power optimization problem into a convex optimization problem, thereby enhancing network transmission capacity. Finally, simulation results verify the effectiveness of the proposed scheme.","PeriodicalId":13052,"journal":{"name":"IEEE Transactions on Green Communications and Networking","volume":"9 2","pages":"698-710"},"PeriodicalIF":5.3000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Green Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10638127/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper deals with a cooperation communication problem (relay selection and power control) for mobile underwater acoustic communication networks. To achieve satisfactory transmission capacity, we propose a reinforcement-learning-based cooperation communication scheme to efficiently resist the highly dynamic communication links and strongly unknown time-varying channel states caused by the mobility of Autonomous Underwater Vehicles (AUVs). Firstly, a particular Markov decision process is developed to model the dynamic relay selection process of mobile AUV in the unknown scenario. In the developed model, an experimental statistical-based partition mechanism is proposed to cope with the greatly increasing dimension of the state space caused by the mobility of AUV, reducing the search optimization difficulty. Secondly, a dual-thread reinforcement learning structure with actual and virtual learning threads is proposed to efficiently track the superior relay action. In the actual learning thread, the proposed improved probability greedy policy enables the AUV to strengthen the exploration for the reward information of potential superior relays on the current state. Meanwhile, in the virtual learning thread, the proposed upper-confidence-bound-index-based uncertainty estimation method can estimate the action-reward level of historical states. Consequently, the combination of actual and virtual learning threads can efficiently obtain satisfactory Q value information, thereby making superior relay decision-making in a short time. Thirdly, a power control mechanism is proposed to reuse the current observed action-reward information and transform the multiple unknown parameter nonlinear joint power optimization problem into a convex optimization problem, thereby enhancing network transmission capacity. Finally, simulation results verify the effectiveness of the proposed scheme.