Decentralized Consensus Inference-Based Hierarchical Reinforcement Learning for Multiconstrained UAV Pursuit-Evasion Game.

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-10-01 DOI:10.1109/TNNLS.2025.3582909

Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao, Honggang Zhang

{"title":"Decentralized Consensus Inference-Based Hierarchical Reinforcement Learning for Multiconstrained UAV Pursuit-Evasion Game.","authors":"Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao, Honggang Zhang","doi":"10.1109/TNNLS.2025.3582909","DOIUrl":null,"url":null,"abstract":"<p><p>Multiple quadrotor uncrewed aerial vehicles (UAVs) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multiconstrained pursuit-evasion games (MC-PEGs). The cooperative evasion and formation coverage (CEFC) task, where the UAV swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEGs, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this article, we propose a novel two-level framework [i.e., consensus inference-based hierarchical reinforcement learning (CI-HRL)], which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multiagent reinforcement learning (RL) module, consensus-oriented multiagent communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an alternative training-based MAPPO (AT-M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":"18229-18243"},"PeriodicalIF":8.9000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2025.3582909","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multiple quadrotor uncrewed aerial vehicles (UAVs) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multiconstrained pursuit-evasion games (MC-PEGs). The cooperative evasion and formation coverage (CEFC) task, where the UAV swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEGs, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this article, we propose a novel two-level framework [i.e., consensus inference-based hierarchical reinforcement learning (CI-HRL)], which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multiagent reinforcement learning (RL) module, consensus-oriented multiagent communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an alternative training-based MAPPO (AT-M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities.

查看原文本刊更多论文

基于分散共识推理的多约束无人机追逃博弈层次强化学习。

多四旋翼飞行器（uav）系统已经引起了广泛的研究兴趣，并培养了大量有趣的应用，特别是在多约束追逃游戏（mc - peg）中。协同躲避和编队覆盖（CEFC）任务是mc - peg中最具挑战性的问题之一，特别是在通信受限的条件下，无人机群的目标是在多个目标区域内最大化编队覆盖，同时协同躲避捕食者。这是一个多方面的问题，涉及到对障碍、对手、目标区域和地层动态的反应，在寻找解决方案时带来了显著的高维复杂性。在本文中，我们提出了一个新的两层框架[即基于共识推理的分层强化学习（CI-HRL）]，它将目标定位委托给高级策略，同时采用低级策略来管理避障、导航和编队。具体而言，在高层策略中，我们开发了一种新的多智能体强化学习（RL）模块，面向共识的多智能体通信（conmac），使智能体能够感知全局信息，并通过有效地聚合邻居消息从局部状态建立共识。同时，我们利用另一种基于训练的MAPPO （AT-M）和策略蒸馏来完成底层控制。实验结果，包括高保真软件在环（SITL）模拟，验证了CI-HRL提供了一种卓越的解决方案，增强了群的协同逃避和任务完成能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.