Formal and scalable multi-robot coordination methods for long horizon tasks with time uncertainty

IF 5.2 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Robotics and Autonomous Systems Pub Date : 2025-06-18 DOI:10.1016/j.robot.2025.105103

Carlos Azevedo , Pedro U. Lima

{"title":"Formal and scalable multi-robot coordination methods for long horizon tasks with time uncertainty","authors":"Carlos Azevedo , Pedro U. Lima","doi":"10.1016/j.robot.2025.105103","DOIUrl":null,"url":null,"abstract":"<div><div>Many real-world robotic applications, such as monitoring, inspection, and surveillance tasks, require effective multi-robot coordination over extended time horizons. These scenarios benefit from long-term planning and execution, and the ability to handle time uncertainty a priori significantly enhances efficiency in unpredictable environments. In this work, we introduce and compare two approaches for synthesizing coordination policies for multi-robot systems that account for time uncertainty and optimize performance over an infinite horizon. Both approaches are based on reasoning over a generalized stochastic Petri net with rewards (GSPNR) model and optimize the average reward criterion. The first approach is an exact method that provides formal guarantees on the synthesized policies and ensures convergence to the optimal policy. We evaluate this method in a solar farm inspection scenario, comparing its performance to discounted reward optimization methods and a carefully designed hand-crafted policy. The results demonstrate that, over the long term, the exact method outperforms these alternatives. However, its scalability is limited, as it cannot handle large state spaces. To address this limitation, we propose a second approach that uses an actor-critic deep reinforcement learning algorithm. This method learns policies directly within the GSPNR formalism and optimizes for the average reward criterion. We assess its performance in the same solar farm inspection scenario, and the results show that it outperforms proximal policy optimization methods. Moreover, it is capable of finding near-optimal solutions in models with state spaces five orders of magnitude larger than those tractable by the exact method.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"193 ","pages":"Article 105103"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025002003","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Many real-world robotic applications, such as monitoring, inspection, and surveillance tasks, require effective multi-robot coordination over extended time horizons. These scenarios benefit from long-term planning and execution, and the ability to handle time uncertainty a priori significantly enhances efficiency in unpredictable environments. In this work, we introduce and compare two approaches for synthesizing coordination policies for multi-robot systems that account for time uncertainty and optimize performance over an infinite horizon. Both approaches are based on reasoning over a generalized stochastic Petri net with rewards (GSPNR) model and optimize the average reward criterion. The first approach is an exact method that provides formal guarantees on the synthesized policies and ensures convergence to the optimal policy. We evaluate this method in a solar farm inspection scenario, comparing its performance to discounted reward optimization methods and a carefully designed hand-crafted policy. The results demonstrate that, over the long term, the exact method outperforms these alternatives. However, its scalability is limited, as it cannot handle large state spaces. To address this limitation, we propose a second approach that uses an actor-critic deep reinforcement learning algorithm. This method learns policies directly within the GSPNR formalism and optimizes for the average reward criterion. We assess its performance in the same solar farm inspection scenario, and the results show that it outperforms proximal policy optimization methods. Moreover, it is capable of finding near-optimal solutions in models with state spaces five orders of magnitude larger than those tractable by the exact method.

查看原文本刊更多论文

具有时间不确定性的长视界任务的形式化可扩展多机器人协调方法

许多现实世界的机器人应用，如监控、检查和监视任务，需要在较长的时间范围内有效地多机器人协调。这些场景得益于长期的计划和执行，并且处理时间不确定性的能力在不可预测的环境中显著提高了效率。在这项工作中，我们介绍并比较了两种用于综合多机器人系统协调策略的方法，这些方法考虑了时间不确定性并在无限视界上优化性能。这两种方法都是基于基于广义随机奖励Petri网（GSPNR）模型的推理和优化平均奖励标准。第一种方法是一种精确的方法，它提供了对综合策略的形式化保证，并确保收敛到最优策略。我们在太阳能发电场检查场景中评估了该方法，将其性能与折扣奖励优化方法和精心设计的手工策略进行了比较。结果表明，从长期来看，精确的方法优于这些替代方法。然而，它的可伸缩性是有限的，因为它不能处理大的状态空间。为了解决这一限制，我们提出了第二种方法，即使用actor-critic深度强化学习算法。该方法直接在GSPNR形式中学习策略，并根据平均奖励标准进行优化。我们在相同的太阳能发电场检查场景中评估了它的性能，结果表明它优于近端策略优化方法。此外，它能够在状态空间比精确方法大5个数量级的模型中找到接近最优的解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.