Multi-Agent Reinforcement Learning With Action Masking for UAV-Enabled Mobile Communications

Danish Rizvi;David Boyle
{"title":"Multi-Agent Reinforcement Learning With Action Masking for UAV-Enabled Mobile Communications","authors":"Danish Rizvi;David Boyle","doi":"10.1109/TMLCN.2024.3521876","DOIUrl":null,"url":null,"abstract":"Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. Then the efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to Mutual DQN algorithm; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.","PeriodicalId":100641,"journal":{"name":"IEEE Transactions on Machine Learning in Communications and Networking","volume":"3 ","pages":"117-132"},"PeriodicalIF":0.0000,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10812765","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Machine Learning in Communications and Networking","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10812765/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Unmanned Aerial Vehicles (UAVs) are increasingly used as aerial base stations to provide ad hoc communications infrastructure. Building upon prior research efforts which consider either static nodes, 2D trajectories or single UAV systems, this paper focuses on the use of multiple UAVs for providing wireless communication to mobile users in the absence of terrestrial communications infrastructure. In particular, we jointly optimize UAV 3D trajectory and NOMA power allocation to maximize system throughput. Firstly, a weighted K-means-based clustering algorithm establishes UAV-user associations at regular intervals. Then the efficacy of training a novel Shared Deep Q-Network (SDQN) with action masking is explored. Unlike training each UAV separately using DQN, the SDQN reduces training time by using the experiences of multiple UAVs instead of a single agent. We also show that SDQN can be used to train a multi-agent system with differing action spaces. Simulation results confirm that: 1) training a shared DQN outperforms a conventional DQN in terms of maximum system throughput (+20%) and training time (-10%); 2) it can converge for agents with different action spaces, yielding a 9% increase in throughput compared to Mutual DQN algorithm; and 3) combining NOMA with an SDQN architecture enables the network to achieve a better sum rate compared with existing baseline schemes.
基于动作掩蔽的无人机移动通信多智能体强化学习
无人驾驶飞行器(uav)越来越多地被用作空中基站,以提供自组织通信基础设施。在先前考虑静态节点、二维轨迹或单个无人机系统的研究成果的基础上,本文重点研究了在没有地面通信基础设施的情况下,使用多个无人机为移动用户提供无线通信。特别是,我们共同优化了无人机的3D轨迹和NOMA功率分配,以最大限度地提高系统吞吐量。首先,基于加权k均值的聚类算法以一定的间隔建立无人机用户关联。然后探讨了带动作掩蔽的新型共享深度q网络(SDQN)的训练效果。与使用DQN单独训练每架无人机不同,SDQN通过使用多架无人机的经验而不是单个代理来减少训练时间。我们还证明了SDQN可以用于训练具有不同动作空间的多智能体系统。仿真结果证实:1)在最大系统吞吐量(+20%)和训练时间(-10%)方面,训练共享DQN优于传统DQN;2)对于具有不同动作空间的智能体可以收敛,吞吐量比Mutual DQN算法提高9%;3)与现有基准方案相比,NOMA与SDQN架构的结合使网络能够获得更好的求和速率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信