Xiaoran Kong , Jianyong Yang , Xinghua Chai , Yatong Zhou
{"title":"一种面向未知环境下多无人机协同目标搜索的双工多智能体q学习算法","authors":"Xiaoran Kong , Jianyong Yang , Xinghua Chai , Yatong Zhou","doi":"10.1016/j.simpat.2025.103118","DOIUrl":null,"url":null,"abstract":"<div><div>Multiple unmanned aerial vehicles (UAVs) cooperative target search has been extensively applied in post-disaster relief and surveillance tasks. However, achieving efficient cooperative target search in unknown environments without prior information is extremely challenging. In the study, a novel multi-agent deep reinforcement learning (MADRL)-based approach is proposed to enable UAVs to execute target search in the three-dimensional (3D) unknown environments. Specifically, the target search problem is formulated as a decentralized partially observable Markov decision processes (Dec-POMDP), where each UAV maintains its own target existence probability map and merges with those of other UAVs within communication range to enhance UAVs’ perception of task environment. Then, an improved duPLEX dueling multi-agent Q-learning (QPLEX) algorithm called Advantage QPLEX is proposed to make the optimal decision for multiple UAVs target search. The Advantage QPLEX can guide UAVs to focus on the advantage steps during the search to improve search efficiency, and direct UAVs to select the advantage action in each step for a greater return. In addition, a novel reward function is well-designed for cooperative target search problems to drive UAVs to explore and utilize the environmental information efficiently. Extensive simulations conducted on the Airsim validate that the Advantage QPLEX outperforms the existing algorithms in terms of the coverage rate and search rate.</div></div>","PeriodicalId":49518,"journal":{"name":"Simulation Modelling Practice and Theory","volume":"142 ","pages":"Article 103118"},"PeriodicalIF":3.5000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An advantage duPLEX dueling multi-agent Q-learning algorithm for multi-UAV cooperative target search in unknown environments\",\"authors\":\"Xiaoran Kong , Jianyong Yang , Xinghua Chai , Yatong Zhou\",\"doi\":\"10.1016/j.simpat.2025.103118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Multiple unmanned aerial vehicles (UAVs) cooperative target search has been extensively applied in post-disaster relief and surveillance tasks. However, achieving efficient cooperative target search in unknown environments without prior information is extremely challenging. In the study, a novel multi-agent deep reinforcement learning (MADRL)-based approach is proposed to enable UAVs to execute target search in the three-dimensional (3D) unknown environments. Specifically, the target search problem is formulated as a decentralized partially observable Markov decision processes (Dec-POMDP), where each UAV maintains its own target existence probability map and merges with those of other UAVs within communication range to enhance UAVs’ perception of task environment. Then, an improved duPLEX dueling multi-agent Q-learning (QPLEX) algorithm called Advantage QPLEX is proposed to make the optimal decision for multiple UAVs target search. The Advantage QPLEX can guide UAVs to focus on the advantage steps during the search to improve search efficiency, and direct UAVs to select the advantage action in each step for a greater return. In addition, a novel reward function is well-designed for cooperative target search problems to drive UAVs to explore and utilize the environmental information efficiently. Extensive simulations conducted on the Airsim validate that the Advantage QPLEX outperforms the existing algorithms in terms of the coverage rate and search rate.</div></div>\",\"PeriodicalId\":49518,\"journal\":{\"name\":\"Simulation Modelling Practice and Theory\",\"volume\":\"142 \",\"pages\":\"Article 103118\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Simulation Modelling Practice and Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569190X2500053X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Simulation Modelling Practice and Theory","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569190X2500053X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
An advantage duPLEX dueling multi-agent Q-learning algorithm for multi-UAV cooperative target search in unknown environments
Multiple unmanned aerial vehicles (UAVs) cooperative target search has been extensively applied in post-disaster relief and surveillance tasks. However, achieving efficient cooperative target search in unknown environments without prior information is extremely challenging. In the study, a novel multi-agent deep reinforcement learning (MADRL)-based approach is proposed to enable UAVs to execute target search in the three-dimensional (3D) unknown environments. Specifically, the target search problem is formulated as a decentralized partially observable Markov decision processes (Dec-POMDP), where each UAV maintains its own target existence probability map and merges with those of other UAVs within communication range to enhance UAVs’ perception of task environment. Then, an improved duPLEX dueling multi-agent Q-learning (QPLEX) algorithm called Advantage QPLEX is proposed to make the optimal decision for multiple UAVs target search. The Advantage QPLEX can guide UAVs to focus on the advantage steps during the search to improve search efficiency, and direct UAVs to select the advantage action in each step for a greater return. In addition, a novel reward function is well-designed for cooperative target search problems to drive UAVs to explore and utilize the environmental information efficiently. Extensive simulations conducted on the Airsim validate that the Advantage QPLEX outperforms the existing algorithms in terms of the coverage rate and search rate.
期刊介绍:
The journal Simulation Modelling Practice and Theory provides a forum for original, high-quality papers dealing with any aspect of systems simulation and modelling.
The journal aims at being a reference and a powerful tool to all those professionally active and/or interested in the methods and applications of simulation. Submitted papers will be peer reviewed and must significantly contribute to modelling and simulation in general or use modelling and simulation in application areas.
Paper submission is solicited on:
• theoretical aspects of modelling and simulation including formal modelling, model-checking, random number generators, sensitivity analysis, variance reduction techniques, experimental design, meta-modelling, methods and algorithms for validation and verification, selection and comparison procedures etc.;
• methodology and application of modelling and simulation in any area, including computer systems, networks, real-time and embedded systems, mobile and intelligent agents, manufacturing and transportation systems, management, engineering, biomedical engineering, economics, ecology and environment, education, transaction handling, etc.;
• simulation languages and environments including those, specific to distributed computing, grid computing, high performance computers or computer networks, etc.;
• distributed and real-time simulation, simulation interoperability;
• tools for high performance computing simulation, including dedicated architectures and parallel computing.