Zekun Lu, Linbo Zhai, Wenjie Zhou, Kai Xue, Xingxia Gao
{"title":"Beamforming design and trajectory optimization for integrated sensing and communication supported by multiple UAVs based on DRL","authors":"Zekun Lu, Linbo Zhai, Wenjie Zhou, Kai Xue, Xingxia Gao","doi":"10.1016/j.vehcom.2025.100932","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of Unmanned aerial vehicle (UAV) technology and the high flexibility and maneuverability of UAV itself, UAV will play a very important role in the development of integrated sensing and communication (ISAC) in the future. In this paper, the communication and sensing system supported by multiple UAVs is studied. And we propose a new ISAC balance mode (BISAC). In this mode, the sensing time is set reasonably according to the number of potential targets (PTs) and sensing requirements while the UAV is communicating with ground equipment (GEs), so as to reduce the interaction between communication and sensing and improve the utilization of resources. We also introduce the Age of Information (AoI) to measure the freshness of GEs' data information in order to reduce the delay. Therefore, our goal is to minimize the Average AoI of GEs by jointly optimizing UAV trajectory, user association, target sensing selection and communication and sensing beamforming while maintaining communication quality and sensing requirements. In order to obtain long-term AoI performance and effectively solve non-convex problems with continuous and discrete variables, we propose a deep reinforcement learning (DRL) algorithm based on a combination of deep deterministic policy gradient (DDPG) and Dueling Double Deep Q networks (D3QN). Continuous and discrete variables in the system are processed by invoking a DDPG and D3QN. Specifically, we have improved DDPG's actor-critic structure by incorporating D3QN, which utilizes the actor portion of DDPG to search for optimal communication and sensing beams. At the same time, the critic part of DDPG is combined with D3QN to select the optimal flight direction of UAV. Simulation results show that the proposed DDPG-D3QN algorithm has better stability, faster convergence rate, and higher reward than existing DRL-based methods.</div></div>","PeriodicalId":54346,"journal":{"name":"Vehicular Communications","volume":"54 ","pages":"Article 100932"},"PeriodicalIF":5.8000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vehicular Communications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214209625000592","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of Unmanned aerial vehicle (UAV) technology and the high flexibility and maneuverability of UAV itself, UAV will play a very important role in the development of integrated sensing and communication (ISAC) in the future. In this paper, the communication and sensing system supported by multiple UAVs is studied. And we propose a new ISAC balance mode (BISAC). In this mode, the sensing time is set reasonably according to the number of potential targets (PTs) and sensing requirements while the UAV is communicating with ground equipment (GEs), so as to reduce the interaction between communication and sensing and improve the utilization of resources. We also introduce the Age of Information (AoI) to measure the freshness of GEs' data information in order to reduce the delay. Therefore, our goal is to minimize the Average AoI of GEs by jointly optimizing UAV trajectory, user association, target sensing selection and communication and sensing beamforming while maintaining communication quality and sensing requirements. In order to obtain long-term AoI performance and effectively solve non-convex problems with continuous and discrete variables, we propose a deep reinforcement learning (DRL) algorithm based on a combination of deep deterministic policy gradient (DDPG) and Dueling Double Deep Q networks (D3QN). Continuous and discrete variables in the system are processed by invoking a DDPG and D3QN. Specifically, we have improved DDPG's actor-critic structure by incorporating D3QN, which utilizes the actor portion of DDPG to search for optimal communication and sensing beams. At the same time, the critic part of DDPG is combined with D3QN to select the optimal flight direction of UAV. Simulation results show that the proposed DDPG-D3QN algorithm has better stability, faster convergence rate, and higher reward than existing DRL-based methods.
随着无人机(UAV)技术的快速发展和无人机本身的高灵活性和机动性,无人机将在未来集成传感与通信(ISAC)的发展中发挥非常重要的作用。本文研究了多架无人机支持的通信与传感系统。提出了一种新的ISAC平衡模式(BISAC)。该模式在无人机与地面设备通信时,根据潜在目标数量和传感需求合理设置传感时间,减少通信与传感之间的交互,提高资源利用率。为了减少延迟,我们还引入了信息时代(Age of Information, AoI)来度量ge数据信息的新鲜度。因此,我们的目标是在保持通信质量和传感要求的前提下,通过联合优化无人机轨迹、用户关联、目标传感选择以及通信和传感波束形成,使GEs的平均AoI最小化。为了获得长期的AoI性能并有效解决连续变量和离散变量的非凸问题,我们提出了一种基于深度确定性策略梯度(DDPG)和Dueling Double deep Q网络(D3QN)相结合的深度强化学习(DRL)算法。通过调用DDPG和D3QN来处理系统中的连续变量和离散变量。具体来说,我们通过加入D3QN改进了DDPG的actor-critic结构,D3QN利用DDPG的actor部分来搜索最佳的通信和传感波束。同时,将DDPG的关键部分与D3QN相结合,选择无人机的最优飞行方向。仿真结果表明,与现有基于drl的算法相比,本文提出的DDPG-D3QN算法具有更好的稳定性、更快的收敛速度和更高的奖励。
期刊介绍:
Vehicular communications is a growing area of communications between vehicles and including roadside communication infrastructure. Advances in wireless communications are making possible sharing of information through real time communications between vehicles and infrastructure. This has led to applications to increase safety of vehicles and communication between passengers and the Internet. Standardization efforts on vehicular communication are also underway to make vehicular transportation safer, greener and easier.
The aim of the journal is to publish high quality peer–reviewed papers in the area of vehicular communications. The scope encompasses all types of communications involving vehicles, including vehicle–to–vehicle and vehicle–to–infrastructure. The scope includes (but not limited to) the following topics related to vehicular communications:
Vehicle to vehicle and vehicle to infrastructure communications
Channel modelling, modulating and coding
Congestion Control and scalability issues
Protocol design, testing and verification
Routing in vehicular networks
Security issues and countermeasures
Deployment and field testing
Reducing energy consumption and enhancing safety of vehicles
Wireless in–car networks
Data collection and dissemination methods
Mobility and handover issues
Safety and driver assistance applications
UAV
Underwater communications
Autonomous cooperative driving
Social networks
Internet of vehicles
Standardization of protocols.