一种面向未知环境下多无人机协同目标搜索的双工多智能体q学习算法

IF 3.5 2区 计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Xiaoran Kong , Jianyong Yang , Xinghua Chai , Yatong Zhou
{"title":"一种面向未知环境下多无人机协同目标搜索的双工多智能体q学习算法","authors":"Xiaoran Kong ,&nbsp;Jianyong Yang ,&nbsp;Xinghua Chai ,&nbsp;Yatong Zhou","doi":"10.1016/j.simpat.2025.103118","DOIUrl":null,"url":null,"abstract":"<div><div>Multiple unmanned aerial vehicles (UAVs) cooperative target search has been extensively applied in post-disaster relief and surveillance tasks. However, achieving efficient cooperative target search in unknown environments without prior information is extremely challenging. In the study, a novel multi-agent deep reinforcement learning (MADRL)-based approach is proposed to enable UAVs to execute target search in the three-dimensional (3D) unknown environments. Specifically, the target search problem is formulated as a decentralized partially observable Markov decision processes (Dec-POMDP), where each UAV maintains its own target existence probability map and merges with those of other UAVs within communication range to enhance UAVs’ perception of task environment. Then, an improved duPLEX dueling multi-agent Q-learning (QPLEX) algorithm called Advantage QPLEX is proposed to make the optimal decision for multiple UAVs target search. The Advantage QPLEX can guide UAVs to focus on the advantage steps during the search to improve search efficiency, and direct UAVs to select the advantage action in each step for a greater return. In addition, a novel reward function is well-designed for cooperative target search problems to drive UAVs to explore and utilize the environmental information efficiently. Extensive simulations conducted on the Airsim validate that the Advantage QPLEX outperforms the existing algorithms in terms of the coverage rate and search rate.</div></div>","PeriodicalId":49518,"journal":{"name":"Simulation Modelling Practice and Theory","volume":"142 ","pages":"Article 103118"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An advantage duPLEX dueling multi-agent Q-learning algorithm for multi-UAV cooperative target search in unknown environments\",\"authors\":\"Xiaoran Kong ,&nbsp;Jianyong Yang ,&nbsp;Xinghua Chai ,&nbsp;Yatong Zhou\",\"doi\":\"10.1016/j.simpat.2025.103118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multiple unmanned aerial vehicles (UAVs) cooperative target search has been extensively applied in post-disaster relief and surveillance tasks. However, achieving efficient cooperative target search in unknown environments without prior information is extremely challenging. In the study, a novel multi-agent deep reinforcement learning (MADRL)-based approach is proposed to enable UAVs to execute target search in the three-dimensional (3D) unknown environments. Specifically, the target search problem is formulated as a decentralized partially observable Markov decision processes (Dec-POMDP), where each UAV maintains its own target existence probability map and merges with those of other UAVs within communication range to enhance UAVs’ perception of task environment. Then, an improved duPLEX dueling multi-agent Q-learning (QPLEX) algorithm called Advantage QPLEX is proposed to make the optimal decision for multiple UAVs target search. The Advantage QPLEX can guide UAVs to focus on the advantage steps during the search to improve search efficiency, and direct UAVs to select the advantage action in each step for a greater return. In addition, a novel reward function is well-designed for cooperative target search problems to drive UAVs to explore and utilize the environmental information efficiently. Extensive simulations conducted on the Airsim validate that the Advantage QPLEX outperforms the existing algorithms in terms of the coverage rate and search rate.</div></div>\",\"PeriodicalId\":49518,\"journal\":{\"name\":\"Simulation Modelling Practice and Theory\",\"volume\":\"142 \",\"pages\":\"Article 103118\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Simulation Modelling Practice and Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569190X2500053X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Simulation Modelling Practice and Theory","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569190X2500053X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

多架无人机协同目标搜索在灾后救援和监视任务中得到了广泛应用。然而,如何在没有先验信息的未知环境下实现高效的协同目标搜索是一个极具挑战性的问题。在研究中,提出了一种基于多智能体深度强化学习(MADRL)的新型方法,使无人机能够在三维(3D)未知环境中执行目标搜索。具体而言,将目标搜索问题表述为分散的部分可观察马尔可夫决策过程(Dec-POMDP),其中每架无人机保持自己的目标存在概率图,并与通信范围内其他无人机的目标存在概率图合并,以增强无人机对任务环境的感知。然后,提出了一种改进的双工决斗多智能体q -学习(QPLEX)算法——Advantage QPLEX,用于多无人机目标搜索的最优决策。优势QPLEX可以引导无人机在搜索过程中关注优势步骤,提高搜索效率,并指导无人机在每个步骤中选择优势动作,以获得更大的回报。此外,针对协同目标搜索问题,设计了一种新的奖励函数,以驱动无人机高效地探索和利用环境信息。在Airsim上进行的大量仿真验证了Advantage QPLEX在覆盖率和搜索率方面优于现有算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An advantage duPLEX dueling multi-agent Q-learning algorithm for multi-UAV cooperative target search in unknown environments
Multiple unmanned aerial vehicles (UAVs) cooperative target search has been extensively applied in post-disaster relief and surveillance tasks. However, achieving efficient cooperative target search in unknown environments without prior information is extremely challenging. In the study, a novel multi-agent deep reinforcement learning (MADRL)-based approach is proposed to enable UAVs to execute target search in the three-dimensional (3D) unknown environments. Specifically, the target search problem is formulated as a decentralized partially observable Markov decision processes (Dec-POMDP), where each UAV maintains its own target existence probability map and merges with those of other UAVs within communication range to enhance UAVs’ perception of task environment. Then, an improved duPLEX dueling multi-agent Q-learning (QPLEX) algorithm called Advantage QPLEX is proposed to make the optimal decision for multiple UAVs target search. The Advantage QPLEX can guide UAVs to focus on the advantage steps during the search to improve search efficiency, and direct UAVs to select the advantage action in each step for a greater return. In addition, a novel reward function is well-designed for cooperative target search problems to drive UAVs to explore and utilize the environmental information efficiently. Extensive simulations conducted on the Airsim validate that the Advantage QPLEX outperforms the existing algorithms in terms of the coverage rate and search rate.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Simulation Modelling Practice and Theory
Simulation Modelling Practice and Theory 工程技术-计算机:跨学科应用
CiteScore
9.80
自引率
4.80%
发文量
142
审稿时长
21 days
期刊介绍: The journal Simulation Modelling Practice and Theory provides a forum for original, high-quality papers dealing with any aspect of systems simulation and modelling. The journal aims at being a reference and a powerful tool to all those professionally active and/or interested in the methods and applications of simulation. Submitted papers will be peer reviewed and must significantly contribute to modelling and simulation in general or use modelling and simulation in application areas. Paper submission is solicited on: • theoretical aspects of modelling and simulation including formal modelling, model-checking, random number generators, sensitivity analysis, variance reduction techniques, experimental design, meta-modelling, methods and algorithms for validation and verification, selection and comparison procedures etc.; • methodology and application of modelling and simulation in any area, including computer systems, networks, real-time and embedded systems, mobile and intelligent agents, manufacturing and transportation systems, management, engineering, biomedical engineering, economics, ecology and environment, education, transaction handling, etc.; • simulation languages and environments including those, specific to distributed computing, grid computing, high performance computers or computer networks, etc.; • distributed and real-time simulation, simulation interoperability; • tools for high performance computing simulation, including dedicated architectures and parallel computing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信