Joint UAV trajectory and communication design with heterogeneous multi-agent reinforcement learning

IF 7.6 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Science China Information Sciences Pub Date : 2024-02-20 DOI:10.1007/s11432-023-3906-3

Xuanhan Zhou, Jun Xiong, Haitao Zhao, Xiaoran Liu, Baoquan Ren, Xiaochen Zhang, Jibo Wei, Hao Yin

{"title":"Joint UAV trajectory and communication design with heterogeneous multi-agent reinforcement learning","authors":"Xuanhan Zhou, Jun Xiong, Haitao Zhao, Xiaoran Liu, Baoquan Ren, Xiaochen Zhang, Jibo Wei, Hao Yin","doi":"10.1007/s11432-023-3906-3","DOIUrl":null,"url":null,"abstract":"<p>Unmanned aerial vehicles (UAVs) are recognized as effective means for delivering emergency communication services when terrestrial infrastructures are unavailable. This paper investigates a multi-UAV-assisted communication system, where we jointly optimize UAVs’ trajectories, user association, and ground users (GUs)’ transmit power to maximize a defined fairness-weighted throughput metric. Owing to the dynamic nature of UAVs, this problem has to be solved in real time. However, the problem’s non-convex and combinatorial attributes pose challenges for conventional optimization-based algorithms, particularly in scenarios without central controllers. To address this issue, we propose a multi-agent deep reinforcement learning (MADRL) approach to provide distributed and online solutions. In contrast to previous MADRL-based methods considering only UAV agents, we model UAVs and GUs as heterogeneous agents sharing a common objective. Specifically, UAVs are tasked with optimizing their trajectories, while GUs are responsible for selecting a UAV for association and determining a transmit power level. To learn policies for these heterogeneous agents, we design a heterogeneous coordinated QMIX (HC-QMIX) algorithm to train local Q-networks in a centralized manner. With these well-trained local Q-networks, UAVs and GUs can make individual decisions based on their local observations. Extensive simulation results demonstrate that the proposed algorithm outperforms state-of-the-art benchmarks in terms of total throughput and system fairness.</p>","PeriodicalId":21618,"journal":{"name":"Science China Information Sciences","volume":"142 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science China Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11432-023-3906-3","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Unmanned aerial vehicles (UAVs) are recognized as effective means for delivering emergency communication services when terrestrial infrastructures are unavailable. This paper investigates a multi-UAV-assisted communication system, where we jointly optimize UAVs’ trajectories, user association, and ground users (GUs)’ transmit power to maximize a defined fairness-weighted throughput metric. Owing to the dynamic nature of UAVs, this problem has to be solved in real time. However, the problem’s non-convex and combinatorial attributes pose challenges for conventional optimization-based algorithms, particularly in scenarios without central controllers. To address this issue, we propose a multi-agent deep reinforcement learning (MADRL) approach to provide distributed and online solutions. In contrast to previous MADRL-based methods considering only UAV agents, we model UAVs and GUs as heterogeneous agents sharing a common objective. Specifically, UAVs are tasked with optimizing their trajectories, while GUs are responsible for selecting a UAV for association and determining a transmit power level. To learn policies for these heterogeneous agents, we design a heterogeneous coordinated QMIX (HC-QMIX) algorithm to train local Q-networks in a centralized manner. With these well-trained local Q-networks, UAVs and GUs can make individual decisions based on their local observations. Extensive simulation results demonstrate that the proposed algorithm outperforms state-of-the-art benchmarks in terms of total throughput and system fairness.

查看原文本刊更多论文

利用异构多代理强化学习进行无人机轨迹和通信联合设计

无人飞行器（UAV）被认为是在地面基础设施不可用时提供应急通信服务的有效手段。本文研究了一种多无人机辅助通信系统，在该系统中，我们联合优化无人机的轨迹、用户关联和地面用户（GU）的发射功率，以最大限度地提高定义的公平加权吞吐量指标。由于无人机的动态特性，这个问题必须实时解决。然而，该问题的非凸和组合属性给传统的优化算法带来了挑战，尤其是在没有中央控制器的情况下。为解决这一问题，我们提出了一种多代理深度强化学习（MADRL）方法，以提供分布式在线解决方案。与之前仅考虑无人机代理的基于 MADRL 的方法不同，我们将无人机和 GU 作为异构代理建模，共享一个共同目标。具体来说，UAV 的任务是优化其飞行轨迹，而 GU 则负责选择关联的 UAV 并确定发射功率级别。为了学习这些异构代理的策略，我们设计了一种异构协调 QMIX（HC-QMIX）算法，以集中方式训练本地 Q 网络。有了这些训练有素的本地 Q 网络，无人机和 GU 可以根据其本地观测结果做出单独决策。广泛的仿真结果表明，所提出的算法在总吞吐量和系统公平性方面优于最先进的基准。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Science China Information Sciences COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

12.60

自引率

5.70%

发文量

224

审稿时长

8.3 months

期刊介绍： Science China Information Sciences is a dedicated journal that showcases high-quality, original research across various domains of information sciences. It encompasses Computer Science & Technologies, Control Science & Engineering, Information & Communication Engineering, Microelectronics & Solid-State Electronics, and Quantum Information, providing a platform for the dissemination of significant contributions in these fields.