Beamforming design and trajectory optimization for integrated sensing and communication supported by multiple UAVs based on DRL

IF 6.5 2区计算机科学 Q1 TELECOMMUNICATIONS

Vehicular Communications Pub Date : 2025-05-05 DOI:10.1016/j.vehcom.2025.100932

Zekun Lu, Linbo Zhai, Wenjie Zhou, Kai Xue, Xingxia Gao

{"title":"Beamforming design and trajectory optimization for integrated sensing and communication supported by multiple UAVs based on DRL","authors":"Zekun Lu, Linbo Zhai, Wenjie Zhou, Kai Xue, Xingxia Gao","doi":"10.1016/j.vehcom.2025.100932","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of Unmanned aerial vehicle (UAV) technology and the high flexibility and maneuverability of UAV itself, UAV will play a very important role in the development of integrated sensing and communication (ISAC) in the future. In this paper, the communication and sensing system supported by multiple UAVs is studied. And we propose a new ISAC balance mode (BISAC). In this mode, the sensing time is set reasonably according to the number of potential targets (PTs) and sensing requirements while the UAV is communicating with ground equipment (GEs), so as to reduce the interaction between communication and sensing and improve the utilization of resources. We also introduce the Age of Information (AoI) to measure the freshness of GEs' data information in order to reduce the delay. Therefore, our goal is to minimize the Average AoI of GEs by jointly optimizing UAV trajectory, user association, target sensing selection and communication and sensing beamforming while maintaining communication quality and sensing requirements. In order to obtain long-term AoI performance and effectively solve non-convex problems with continuous and discrete variables, we propose a deep reinforcement learning (DRL) algorithm based on a combination of deep deterministic policy gradient (DDPG) and Dueling Double Deep Q networks (D3QN). Continuous and discrete variables in the system are processed by invoking a DDPG and D3QN. Specifically, we have improved DDPG's actor-critic structure by incorporating D3QN, which utilizes the actor portion of DDPG to search for optimal communication and sensing beams. At the same time, the critic part of DDPG is combined with D3QN to select the optimal flight direction of UAV. Simulation results show that the proposed DDPG-D3QN algorithm has better stability, faster convergence rate, and higher reward than existing DRL-based methods.</div></div>","PeriodicalId":54346,"journal":{"name":"Vehicular Communications","volume":"54 ","pages":"Article 100932"},"PeriodicalIF":6.5000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vehicular Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214209625000592","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

With the rapid development of Unmanned aerial vehicle (UAV) technology and the high flexibility and maneuverability of UAV itself, UAV will play a very important role in the development of integrated sensing and communication (ISAC) in the future. In this paper, the communication and sensing system supported by multiple UAVs is studied. And we propose a new ISAC balance mode (BISAC). In this mode, the sensing time is set reasonably according to the number of potential targets (PTs) and sensing requirements while the UAV is communicating with ground equipment (GEs), so as to reduce the interaction between communication and sensing and improve the utilization of resources. We also introduce the Age of Information (AoI) to measure the freshness of GEs' data information in order to reduce the delay. Therefore, our goal is to minimize the Average AoI of GEs by jointly optimizing UAV trajectory, user association, target sensing selection and communication and sensing beamforming while maintaining communication quality and sensing requirements. In order to obtain long-term AoI performance and effectively solve non-convex problems with continuous and discrete variables, we propose a deep reinforcement learning (DRL) algorithm based on a combination of deep deterministic policy gradient (DDPG) and Dueling Double Deep Q networks (D3QN). Continuous and discrete variables in the system are processed by invoking a DDPG and D3QN. Specifically, we have improved DDPG's actor-critic structure by incorporating D3QN, which utilizes the actor portion of DDPG to search for optimal communication and sensing beams. At the same time, the critic part of DDPG is combined with D3QN to select the optimal flight direction of UAV. Simulation results show that the proposed DDPG-D3QN algorithm has better stability, faster convergence rate, and higher reward than existing DRL-based methods.

查看原文本刊更多论文

基于DRL的多无人机集成传感与通信波束形成设计与轨迹优化

随着无人机（UAV）技术的快速发展和无人机本身的高灵活性和机动性，无人机将在未来集成传感与通信（ISAC）的发展中发挥非常重要的作用。本文研究了多架无人机支持的通信与传感系统。提出了一种新的ISAC平衡模式（BISAC）。该模式在无人机与地面设备通信时，根据潜在目标数量和传感需求合理设置传感时间，减少通信与传感之间的交互，提高资源利用率。为了减少延迟，我们还引入了信息时代（Age of Information, AoI）来度量ge数据信息的新鲜度。因此，我们的目标是在保持通信质量和传感要求的前提下，通过联合优化无人机轨迹、用户关联、目标传感选择以及通信和传感波束形成，使GEs的平均AoI最小化。为了获得长期的AoI性能并有效解决连续变量和离散变量的非凸问题，我们提出了一种基于深度确定性策略梯度（DDPG）和Dueling Double deep Q网络（D3QN）相结合的深度强化学习（DRL）算法。通过调用DDPG和D3QN来处理系统中的连续变量和离散变量。具体来说，我们通过加入D3QN改进了DDPG的actor-critic结构，D3QN利用DDPG的actor部分来搜索最佳的通信和传感波束。同时，将DDPG的关键部分与D3QN相结合，选择无人机的最优飞行方向。仿真结果表明，与现有基于drl的算法相比，本文提出的DDPG-D3QN算法具有更好的稳定性、更快的收敛速度和更高的奖励。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Vehicular Communications Engineering-Electrical and Electronic Engineering

CiteScore

12.70

自引率

10.40%

发文量

审稿时长

62 days

期刊介绍： Vehicular communications is a growing area of communications between vehicles and including roadside communication infrastructure. Advances in wireless communications are making possible sharing of information through real time communications between vehicles and infrastructure. This has led to applications to increase safety of vehicles and communication between passengers and the Internet. Standardization efforts on vehicular communication are also underway to make vehicular transportation safer, greener and easier. The aim of the journal is to publish high quality peer–reviewed papers in the area of vehicular communications. The scope encompasses all types of communications involving vehicles, including vehicle–to–vehicle and vehicle–to–infrastructure. The scope includes (but not limited to) the following topics related to vehicular communications: Vehicle to vehicle and vehicle to infrastructure communications Channel modelling, modulating and coding Congestion Control and scalability issues Protocol design, testing and verification Routing in vehicular networks Security issues and countermeasures Deployment and field testing Reducing energy consumption and enhancing safety of vehicles Wireless in–car networks Data collection and dissemination methods Mobility and handover issues Safety and driver assistance applications UAV Underwater communications Autonomous cooperative driving Social networks Internet of vehicles Standardization of protocols.