Multi-Agent Reinforcement Learning for Zero-Shot Coverage Path Planning with Dynamic UAV Networks

IF 5.2 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Robotics and Autonomous Systems Pub Date : 2025-09-26 DOI:10.1016/j.robot.2025.105163

José P. Carvalho, A. Pedro Aguiar

{"title":"Multi-Agent Reinforcement Learning for Zero-Shot Coverage Path Planning with Dynamic UAV Networks","authors":"José P. Carvalho, A. Pedro Aguiar","doi":"10.1016/j.robot.2025.105163","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in autonomous systems have enabled the development of intelligent multi-robot systems for dynamic environments. Unmanned Aerial Vehicles play an important role in multi-robot applications such as precision agriculture, search-and-rescue, and wildfire monitoring, all of which rely on solving the coverage path planning problem. While Multi-Agent Coverage Path Planning approaches have shown potential, many existing methods lack the scalability and adaptability needed for diverse and dynamic scenarios. This paper presents a decentralized Multi-Agent Coverage Path Planning framework based on Multi-Agent Reinforcement Learning with parameter sharing and Centralized Training with Decentralized Execution. The framework incorporates a customized Rainbow Deep-Q Network, a size-invariant reward function, and a robustness and safety filter to ensure completeness and reliability in dynamic environments. Our training pipeline combines curriculum learning, domain randomization, and transfer learning, enabling the model to generalize to unseen scenarios. We demonstrate zero-shot generalization on scenarios with significantly larger maps, an increased number of obstacles, and a varying number of agents compared to what is seen during training. Furthermore, the models can also adapt to more structured maps and handle different tasks, such as search-and-rescue, without the need for retraining.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"195 ","pages":"Article 105163"},"PeriodicalIF":5.2000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092188902500260X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in autonomous systems have enabled the development of intelligent multi-robot systems for dynamic environments. Unmanned Aerial Vehicles play an important role in multi-robot applications such as precision agriculture, search-and-rescue, and wildfire monitoring, all of which rely on solving the coverage path planning problem. While Multi-Agent Coverage Path Planning approaches have shown potential, many existing methods lack the scalability and adaptability needed for diverse and dynamic scenarios. This paper presents a decentralized Multi-Agent Coverage Path Planning framework based on Multi-Agent Reinforcement Learning with parameter sharing and Centralized Training with Decentralized Execution. The framework incorporates a customized Rainbow Deep-Q Network, a size-invariant reward function, and a robustness and safety filter to ensure completeness and reliability in dynamic environments. Our training pipeline combines curriculum learning, domain randomization, and transfer learning, enabling the model to generalize to unseen scenarios. We demonstrate zero-shot generalization on scenarios with significantly larger maps, an increased number of obstacles, and a varying number of agents compared to what is seen during training. Furthermore, the models can also adapt to more structured maps and handle different tasks, such as search-and-rescue, without the need for retraining.

查看原文本刊更多论文

基于多智能体强化学习的动态无人机网络零射击覆盖路径规划

自主系统的最新进展使动态环境下智能多机器人系统的发展成为可能。无人机在精准农业、搜索救援、野火监测等多机器人应用中发挥着重要作用，这些应用都依赖于解决覆盖路径规划问题。虽然多智能体覆盖路径规划方法显示出潜力，但许多现有方法缺乏多样化和动态场景所需的可扩展性和适应性。提出了一种基于参数共享的多智能体强化学习和分散执行的集中训练的分散多智能体覆盖路径规划框架。该框架结合了一个定制的Rainbow Deep-Q网络，一个大小不变的奖励函数，以及一个鲁棒性和安全性过滤器，以确保动态环境中的完整性和可靠性。我们的训练管道结合了课程学习、领域随机化和迁移学习，使模型能够推广到未知的场景。与训练中看到的相比，我们在具有更大的地图、增加的障碍物数量和不同数量的智能体的场景中展示了零射击泛化。此外，这些模型还可以适应更结构化的地图，处理不同的任务，比如搜索和救援，而不需要再培训。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.