Reinforcement-Learning-Based Trajectory Design and Phase-Shift Control in UAV-Mounted-RIS Communications

Tianjiao Sun;Sixing Yin;Li Deng;F. Richard Yu
{"title":"Reinforcement-Learning-Based Trajectory Design and Phase-Shift Control in UAV-Mounted-RIS Communications","authors":"Tianjiao Sun;Sixing Yin;Li Deng;F. Richard Yu","doi":"10.1109/TMLCN.2024.3502576","DOIUrl":null,"url":null,"abstract":"Taking advantages of both unmanned aerial vehicles (UAVs) and reconfigurable intelligent surfaces (RISs), UAV-mounted-RIS systems are expected to enhance transmission performance in complicated wireless environments. In this paper, we focus on system design for a UAV-mounted-RIS system and investigate joint optimization for the RIS’s phase shift and the UAV’s trajectory. To cope with the practical issue of inaccessible information on the user terminals’ (UTs) location and channel state, a reinforcement learning (RL)-based solution is proposed to find the optimal policy with finite steps of “trial-and-error”. As the action space is continuous, the deep deterministic policy gradient (DDPG) algorithm is applied to train the RL model. However, the online interaction between the agent and environment may lead to instability during the training and the assumption of (first-order) Markovian state transition could be impractical in real-world problems. Therefore, the decision transformer (DT) algorithm is employed as an alternative for RL model training to adapt to more general situations of state transition. Experimental results demonstrate that the proposed RL solutions are highly efficient in model training along with acceptable performance close to the benchmark, which relies on conventional optimization algorithms with the UT’s locations and channel parameters explicitly known beforehand.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"3 ","pages":"163-175"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10758222","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10758222/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Taking advantages of both unmanned aerial vehicles (UAVs) and reconfigurable intelligent surfaces (RISs), UAV-mounted-RIS systems are expected to enhance transmission performance in complicated wireless environments. In this paper, we focus on system design for a UAV-mounted-RIS system and investigate joint optimization for the RIS’s phase shift and the UAV’s trajectory. To cope with the practical issue of inaccessible information on the user terminals’ (UTs) location and channel state, a reinforcement learning (RL)-based solution is proposed to find the optimal policy with finite steps of “trial-and-error”. As the action space is continuous, the deep deterministic policy gradient (DDPG) algorithm is applied to train the RL model. However, the online interaction between the agent and environment may lead to instability during the training and the assumption of (first-order) Markovian state transition could be impractical in real-world problems. Therefore, the decision transformer (DT) algorithm is employed as an alternative for RL model training to adapt to more general situations of state transition. Experimental results demonstrate that the proposed RL solutions are highly efficient in model training along with acceptable performance close to the benchmark, which relies on conventional optimization algorithms with the UT’s locations and channel parameters explicitly known beforehand.
基于强化学习的无人机- ris通信轨迹设计与相移控制
利用无人机(uav)和可重构智能表面(RISs)的优势,无人机安装的ris系统有望提高复杂无线环境下的传输性能。本文重点研究了无人机挂载RIS系统的系统设计,并研究了RIS相移和无人机轨迹的联合优化问题。针对用户终端位置信息和通道状态信息不可访问的实际问题,提出了一种基于强化学习(RL)的方法,通过有限步的“试错”找出最优策略。由于动作空间是连续的,采用深度确定性策略梯度(deep deterministic policy gradient, DDPG)算法对RL模型进行训练。然而,智能体和环境之间的在线交互可能导致训练过程中的不稳定性,并且(一阶)马尔可夫状态转移的假设在现实问题中可能是不切实际的。因此,决策转换器(DT)算法被用作RL模型训练的替代方法,以适应更一般的状态转移情况。实验结果表明,所提出的RL解决方案在模型训练中非常高效,并且具有接近基准的可接受性能,这依赖于事先明确知道UT位置和通道参数的传统优化算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信