{"title":"UAV formation control based on ensemble reinforcement learning","authors":"Kaifeng Wu , Lei Liu , Chengqing Liang , Lei Li","doi":"10.1016/j.neucom.2025.131056","DOIUrl":null,"url":null,"abstract":"<div><div>Based on the frameworks of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) and Deep Deterministic Policy Gradient (DDPG) algorithms, this paper investigates the UAV formation control problem. To address the convergence difficulties inherent in multi-agent algorithms, curriculum reinforcement learning is applied during the training phase to decompose the task into incremental stages. A progressively hierarchical reward function tailored for each stage is designed, significantly reducing the training complexity of MADDPG. In the inference phase, an ensemble reinforcement learning strategy is adopted to enhance the accuracy of UAV formation control. When the UAVs approach their target positions, the control strategy switches from MADDPG to the DDPG algorithm, thus achieving more efficient and precise control. Through ablation and comparative experiments in a self-developed Software in the Loop (SITL) simulation environment, the effectiveness and stability of the ensemble reinforcement learning algorithm in multi-agent scenarios are validated. Finally, real-world experiments further verify the practical applicability of the proposed algorithm (<span><span>https://b23.tv/7ceLpLe</span><svg><path></path></svg></span>).</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 131056"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092523122501728X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Based on the frameworks of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) and Deep Deterministic Policy Gradient (DDPG) algorithms, this paper investigates the UAV formation control problem. To address the convergence difficulties inherent in multi-agent algorithms, curriculum reinforcement learning is applied during the training phase to decompose the task into incremental stages. A progressively hierarchical reward function tailored for each stage is designed, significantly reducing the training complexity of MADDPG. In the inference phase, an ensemble reinforcement learning strategy is adopted to enhance the accuracy of UAV formation control. When the UAVs approach their target positions, the control strategy switches from MADDPG to the DDPG algorithm, thus achieving more efficient and precise control. Through ablation and comparative experiments in a self-developed Software in the Loop (SITL) simulation environment, the effectiveness and stability of the ensemble reinforcement learning algorithm in multi-agent scenarios are validated. Finally, real-world experiments further verify the practical applicability of the proposed algorithm (https://b23.tv/7ceLpLe).
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.