Deep Reinforcement Learning-based Collaborative Multi-UAV Coverage Path Planning

Journal of Physics: Conference Series Pub Date : 2024-08-01 DOI:10.1088/1742-6596/2833/1/012017

Boquan Zhang, Tian Jing, Xiang Lin, Yanru Cui, Yifan Zhu, Zhi Zhu

{"title":"Deep Reinforcement Learning-based Collaborative Multi-UAV Coverage Path Planning","authors":"Boquan Zhang, Tian Jing, Xiang Lin, Yanru Cui, Yifan Zhu, Zhi Zhu","doi":"10.1088/1742-6596/2833/1/012017","DOIUrl":null,"url":null,"abstract":"The coverage path planning problem has gained significant attention in research due to its wide applicability and practical value in various fields such as logistics and distribution, smart homes, and unmanned vehicles. This paper focuses on studying the coverage path planning problem under multi-UAV collaboration to maximize the coverage of the mission area within a given time. To address this problem, we propose a multi-objective optimization model and reformulate it with the framework of Decentralized Partially Observable Markov Decision Process (Dec-POMDP). We then employ a multi-agent deep reinforcement learning (MADRL) method to solve the problem. Specifically, we introduce the <italic toggle=\"yes\">ε</italic>—Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (<italic toggle=\"yes\">ε</italic>—MADT3), which incorporates an exploration coefficient based on MATD3. This coefficient gradually decays with the number of iterations, allowing for a balance between exploration and exploitation. Numerous simulation results demonstrate that <italic toggle=\"yes\">ε</italic>—MADT3 outperforms the baseline algorithm in terms of coverage rate and number of collisions.","PeriodicalId":16821,"journal":{"name":"Journal of Physics: Conference Series","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Physics: Conference Series","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/1742-6596/2833/1/012017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The coverage path planning problem has gained significant attention in research due to its wide applicability and practical value in various fields such as logistics and distribution, smart homes, and unmanned vehicles. This paper focuses on studying the coverage path planning problem under multi-UAV collaboration to maximize the coverage of the mission area within a given time. To address this problem, we propose a multi-objective optimization model and reformulate it with the framework of Decentralized Partially Observable Markov Decision Process (Dec-POMDP). We then employ a multi-agent deep reinforcement learning (MADRL) method to solve the problem. Specifically, we introduce the ε—Multi-Agent Twin Delayed Deep Deterministic Policy Gradient (ε—MADT3), which incorporates an exploration coefficient based on MATD3. This coefficient gradually decays with the number of iterations, allowing for a balance between exploration and exploitation. Numerous simulation results demonstrate that ε—MADT3 outperforms the baseline algorithm in terms of coverage rate and number of collisions.

查看原文本刊更多论文

基于深度强化学习的协作式多无人机覆盖路径规划

覆盖路径规划问题因其在物流配送、智能家居、无人车等多个领域的广泛适用性和实用价值而备受研究关注。本文重点研究多无人机协作下的覆盖路径规划问题，以在给定时间内最大化任务区域的覆盖范围。为了解决这个问题，我们提出了一个多目标优化模型，并用分散式部分可观测马尔可夫决策过程（Dec-POMDP）框架对其进行了重新表述。然后，我们采用多代理深度强化学习（MADRL）方法来解决这个问题。具体来说，我们引入了ε-多代理双延迟深度确定性策略梯度（ε-MADT3），其中包含一个基于 MATD3 的探索系数。该系数随着迭代次数的增加而逐渐减小，从而在探索和利用之间取得平衡。大量模拟结果表明，ε-MADT3 在覆盖率和碰撞次数方面都优于基准算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Physics: Conference Series

CiteScore

1.20

自引率

0.00%

发文量