Deep Reinforcement Learning for Zero-Shot Coverage Path Planning With Mobile Robots

IF 19.2 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Ieee-Caa Journal of Automatica Sinica Pub Date : 2025-01-31 DOI:10.1109/JAS.2024.125064

José Pedro Carvalho;A. Pedro Aguiar

{"title":"Deep Reinforcement Learning for Zero-Shot Coverage Path Planning With Mobile Robots","authors":"José Pedro Carvalho;A. Pedro Aguiar","doi":"10.1109/JAS.2024.125064","DOIUrl":null,"url":null,"abstract":"The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges, particularly Coverage Path Planning. While this task has been typically tackled with classical algorithms, these often struggle with flexibility and adaptability in unknown environments. On the other hand, recent advances in Reinforcement Learning offer promising approaches, yet a significant gap in the literature remains when it comes to generalization over a large number of parameters. This paper presents a unified, generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques. The novelty of the framework comes from the design of an observation space that accommodates different map sizes, an action masking scheme that guarantees safety and robustness while also serving as a learning-from-demonstration technique during training, and a unique reward function that yields value functions that are size-invariant. These are coupled with a curriculum learning-based training strategy and parametric environment randomization, enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes, configurations, sensor payloads, and sub-tasks. Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training, outperforming a greedy heuristic by sixfold. Furthermore, in out-of-distribution environments, our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios, paving the way for generalizable and adaptable path-planning algorithms.","PeriodicalId":54230,"journal":{"name":"Ieee-Caa Journal of Automatica Sinica","volume":"12 8","pages":"1594-1609"},"PeriodicalIF":19.2000,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ieee-Caa Journal of Automatica Sinica","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10869294/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The ability of mobile robots to plan and execute a path is foundational to various path-planning challenges, particularly Coverage Path Planning. While this task has been typically tackled with classical algorithms, these often struggle with flexibility and adaptability in unknown environments. On the other hand, recent advances in Reinforcement Learning offer promising approaches, yet a significant gap in the literature remains when it comes to generalization over a large number of parameters. This paper presents a unified, generalized framework for coverage path planning that leverages value-based deep reinforcement learning techniques. The novelty of the framework comes from the design of an observation space that accommodates different map sizes, an action masking scheme that guarantees safety and robustness while also serving as a learning-from-demonstration technique during training, and a unique reward function that yields value functions that are size-invariant. These are coupled with a curriculum learning-based training strategy and parametric environment randomization, enabling the agent to tackle complete or partial coverage path planning with perfect or incomplete knowledge while generalizing to different map sizes, configurations, sensor payloads, and sub-tasks. Our empirical results show that the algorithm can perform zero-shot learning scenarios at a near-optimal level in environments that follow a similar distribution as during training, outperforming a greedy heuristic by sixfold. Furthermore, in out-of-distribution environments, our method surpasses existing state-of-the-art algorithms in most zero-shot and all few-shot scenarios, paving the way for generalizable and adaptable path-planning algorithms.

查看原文本刊更多论文

移动机器人零射击覆盖路径规划的深度强化学习

移动机器人规划和执行路径的能力是各种路径规划挑战的基础，特别是覆盖路径规划。虽然这一任务通常是用经典算法来解决的，但这些算法在未知环境中往往缺乏灵活性和适应性。另一方面，强化学习的最新进展提供了很有前途的方法，但是当涉及到大量参数的泛化时，文献中仍然存在重大差距。本文提出了一个统一的、通用的覆盖路径规划框架，该框架利用基于值的深度强化学习技术。该框架的新颖之处在于它设计了一个可容纳不同地图大小的观察空间，一个保证安全性和鲁棒性的动作掩蔽方案，同时在训练期间也作为一种从演示中学习的技术，以及一个独特的奖励函数，它产生的值函数是大小不变的。这些与基于课程学习的训练策略和参数化环境随机化相结合，使智能体能够以完美或不完整的知识处理完全或部分覆盖路径规划，同时推广到不同的地图大小、配置、传感器有效载荷和子任务。我们的经验结果表明，该算法可以在接近最优水平的环境中执行零射击学习场景，这些环境遵循与训练期间相似的分布，比贪婪启发式算法的性能高出六倍。此外，在非分布环境中，我们的方法在大多数零射击和所有少射击场景中超越了现有的最先进算法，为可推广和适应性强的路径规划算法铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Ieee-Caa Journal of Automatica Sinica Engineering-Control and Systems Engineering

CiteScore

23.50

自引率

11.00%

发文量

880

期刊介绍： The IEEE/CAA Journal of Automatica Sinica is a reputable journal that publishes high-quality papers in English on original theoretical/experimental research and development in the field of automation. The journal covers a wide range of topics including automatic control, artificial intelligence and intelligent control, systems theory and engineering, pattern recognition and intelligent systems, automation engineering and applications, information processing and information systems, network-based automation, robotics, sensing and measurement, and navigation, guidance, and control. Additionally, the journal is abstracted/indexed in several prominent databases including SCIE (Science Citation Index Expanded), EI (Engineering Index), Inspec, Scopus, SCImago, DBLP, CNKI (China National Knowledge Infrastructure), CSCD (Chinese Science Citation Database), and IEEE Xplore.