Decentralized Consensus Inference-Based Hierarchical Reinforcement Learning for Multiconstrained UAV Pursuit-Evasion Game.

IF 8.9 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao, Honggang Zhang
{"title":"Decentralized Consensus Inference-Based Hierarchical Reinforcement Learning for Multiconstrained UAV Pursuit-Evasion Game.","authors":"Yuming Xiang, Sizhao Li, Rongpeng Li, Zhifeng Zhao, Honggang Zhang","doi":"10.1109/TNNLS.2025.3582909","DOIUrl":null,"url":null,"abstract":"<p><p>Multiple quadrotor uncrewed aerial vehicles (UAVs) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multiconstrained pursuit-evasion games (MC-PEGs). The cooperative evasion and formation coverage (CEFC) task, where the UAV swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEGs, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this article, we propose a novel two-level framework [i.e., consensus inference-based hierarchical reinforcement learning (CI-HRL)], which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multiagent reinforcement learning (RL) module, consensus-oriented multiagent communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an alternative training-based MAPPO (AT-M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":"18229-18243"},"PeriodicalIF":8.9000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2025.3582909","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Multiple quadrotor uncrewed aerial vehicles (UAVs) systems have garnered widespread research interest and fostered tremendous interesting applications, especially in multiconstrained pursuit-evasion games (MC-PEGs). The cooperative evasion and formation coverage (CEFC) task, where the UAV swarm aims to maximize formation coverage across multiple target zones while collaboratively evading predators, belongs to one of the most challenging issues in MC-PEGs, especially under communication-limited constraints. This multifaceted problem, which intertwines responses to obstacles, adversaries, target zones, and formation dynamics, brings up significant high-dimensional complications in locating a solution. In this article, we propose a novel two-level framework [i.e., consensus inference-based hierarchical reinforcement learning (CI-HRL)], which delegates target localization to a high-level policy, while adopting a low-level policy to manage obstacle avoidance, navigation, and formation. Specifically, in the high-level policy, we develop a novel multiagent reinforcement learning (RL) module, consensus-oriented multiagent communication (ConsMAC), to enable agents to perceive global information and establish consensus from local states by effectively aggregating neighbor messages. Meanwhile, we leverage an alternative training-based MAPPO (AT-M) and policy distillation to accomplish the low-level control. The experimental results, including the high-fidelity software-in-the-loop (SITL) simulations, validate that CI-HRL provides a superior solution with enhanced swarm's collaborative evasion and task completion capabilities.

基于分散共识推理的多约束无人机追逃博弈层次强化学习。
多四旋翼飞行器(uav)系统已经引起了广泛的研究兴趣,并培养了大量有趣的应用,特别是在多约束追逃游戏(mc - peg)中。协同躲避和编队覆盖(CEFC)任务是mc - peg中最具挑战性的问题之一,特别是在通信受限的条件下,无人机群的目标是在多个目标区域内最大化编队覆盖,同时协同躲避捕食者。这是一个多方面的问题,涉及到对障碍、对手、目标区域和地层动态的反应,在寻找解决方案时带来了显著的高维复杂性。在本文中,我们提出了一个新的两层框架[即基于共识推理的分层强化学习(CI-HRL)],它将目标定位委托给高级策略,同时采用低级策略来管理避障、导航和编队。具体而言,在高层策略中,我们开发了一种新的多智能体强化学习(RL)模块,面向共识的多智能体通信(conmac),使智能体能够感知全局信息,并通过有效地聚合邻居消息从局部状态建立共识。同时,我们利用另一种基于训练的MAPPO (AT-M)和策略蒸馏来完成底层控制。实验结果,包括高保真软件在环(SITL)模拟,验证了CI-HRL提供了一种卓越的解决方案,增强了群的协同逃避和任务完成能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE transactions on neural networks and learning systems
IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
CiteScore
23.80
自引率
9.60%
发文量
2102
审稿时长
3-8 weeks
期刊介绍: The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信