Relay Selection and Power Control for Mobile Underwater Acoustic Communication Networks: A Dual-Thread Reinforcement Learning Approach

IF 5.3 2区计算机科学 Q1 TELECOMMUNICATIONS

IEEE Transactions on Green Communications and Networking Pub Date : 2024-08-16 DOI:10.1109/TGCN.2024.3445142

Jun Dai;Xinbin Li;Song Han;Junzhi Yu;Zhixin Liu

{"title":"Relay Selection and Power Control for Mobile Underwater Acoustic Communication Networks: A Dual-Thread Reinforcement Learning Approach","authors":"Jun Dai;Xinbin Li;Song Han;Junzhi Yu;Zhixin Liu","doi":"10.1109/TGCN.2024.3445142","DOIUrl":null,"url":null,"abstract":"This paper deals with a cooperation communication problem (relay selection and power control) for mobile underwater acoustic communication networks. To achieve satisfactory transmission capacity, we propose a reinforcement-learning-based cooperation communication scheme to efficiently resist the highly dynamic communication links and strongly unknown time-varying channel states caused by the mobility of Autonomous Underwater Vehicles (AUVs). Firstly, a particular Markov decision process is developed to model the dynamic relay selection process of mobile AUV in the unknown scenario. In the developed model, an experimental statistical-based partition mechanism is proposed to cope with the greatly increasing dimension of the state space caused by the mobility of AUV, reducing the search optimization difficulty. Secondly, a dual-thread reinforcement learning structure with actual and virtual learning threads is proposed to efficiently track the superior relay action. In the actual learning thread, the proposed improved probability greedy policy enables the AUV to strengthen the exploration for the reward information of potential superior relays on the current state. Meanwhile, in the virtual learning thread, the proposed upper-confidence-bound-index-based uncertainty estimation method can estimate the action-reward level of historical states. Consequently, the combination of actual and virtual learning threads can efficiently obtain satisfactory Q value information, thereby making superior relay decision-making in a short time. Thirdly, a power control mechanism is proposed to reuse the current observed action-reward information and transform the multiple unknown parameter nonlinear joint power optimization problem into a convex optimization problem, thereby enhancing network transmission capacity. Finally, simulation results verify the effectiveness of the proposed scheme.","PeriodicalId":13052,"journal":{"name":"IEEE Transactions on Green Communications and Networking","volume":"9 2","pages":"698-710"},"PeriodicalIF":5.3000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Green Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10638127/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper deals with a cooperation communication problem (relay selection and power control) for mobile underwater acoustic communication networks. To achieve satisfactory transmission capacity, we propose a reinforcement-learning-based cooperation communication scheme to efficiently resist the highly dynamic communication links and strongly unknown time-varying channel states caused by the mobility of Autonomous Underwater Vehicles (AUVs). Firstly, a particular Markov decision process is developed to model the dynamic relay selection process of mobile AUV in the unknown scenario. In the developed model, an experimental statistical-based partition mechanism is proposed to cope with the greatly increasing dimension of the state space caused by the mobility of AUV, reducing the search optimization difficulty. Secondly, a dual-thread reinforcement learning structure with actual and virtual learning threads is proposed to efficiently track the superior relay action. In the actual learning thread, the proposed improved probability greedy policy enables the AUV to strengthen the exploration for the reward information of potential superior relays on the current state. Meanwhile, in the virtual learning thread, the proposed upper-confidence-bound-index-based uncertainty estimation method can estimate the action-reward level of historical states. Consequently, the combination of actual and virtual learning threads can efficiently obtain satisfactory Q value information, thereby making superior relay decision-making in a short time. Thirdly, a power control mechanism is proposed to reuse the current observed action-reward information and transform the multiple unknown parameter nonlinear joint power optimization problem into a convex optimization problem, thereby enhancing network transmission capacity. Finally, simulation results verify the effectiveness of the proposed scheme.

查看原文本刊更多论文

移动水声通信网络中继选择与功率控制：一种双线强化学习方法

研究了移动水声通信网络的协同通信问题（中继选择和功率控制）。为了获得满意的传输容量，我们提出了一种基于强化学习的协作通信方案，以有效抵抗自主水下航行器（auv）机动性引起的高动态通信链路和强未知时变信道状态。首先，建立了一个特定的马尔可夫决策过程，对未知场景下移动AUV的动态中继选择过程进行建模；在开发的模型中，提出了一种基于实验统计的划分机制，以应对由于AUV的移动性而导致的状态空间维数的大幅增加，降低了搜索优化难度。其次，提出了一种具有实际和虚拟学习线程的双线强化学习结构，以有效地跟踪优秀的继电器动作。在实际学习线程中，提出的改进概率贪心策略使AUV能够加强对当前状态下潜在优中继奖励信息的探索。同时，在虚拟学习线程中，提出的基于上置信度边界指标的不确定性估计方法可以估计历史状态的动作奖励水平。因此，将实际学习线程与虚拟学习线程相结合，可以有效地获得满意的Q值信息，从而在短时间内做出更优的中继决策。再次，提出一种功率控制机制，重用当前观测到的动作奖励信息，将多未知参数非线性联合功率优化问题转化为凸优化问题，从而提高网络传输容量。最后，仿真结果验证了该方案的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Green Communications and Networking Computer Science-Computer Networks and Communications

CiteScore

9.30

自引率

6.20%

发文量

181