{"title":"Formal and scalable multi-robot coordination methods for long horizon tasks with time uncertainty","authors":"Carlos Azevedo , Pedro U. Lima","doi":"10.1016/j.robot.2025.105103","DOIUrl":null,"url":null,"abstract":"<div><div>Many real-world robotic applications, such as monitoring, inspection, and surveillance tasks, require effective multi-robot coordination over extended time horizons. These scenarios benefit from long-term planning and execution, and the ability to handle time uncertainty a priori significantly enhances efficiency in unpredictable environments. In this work, we introduce and compare two approaches for synthesizing coordination policies for multi-robot systems that account for time uncertainty and optimize performance over an infinite horizon. Both approaches are based on reasoning over a generalized stochastic Petri net with rewards (GSPNR) model and optimize the average reward criterion. The first approach is an exact method that provides formal guarantees on the synthesized policies and ensures convergence to the optimal policy. We evaluate this method in a solar farm inspection scenario, comparing its performance to discounted reward optimization methods and a carefully designed hand-crafted policy. The results demonstrate that, over the long term, the exact method outperforms these alternatives. However, its scalability is limited, as it cannot handle large state spaces. To address this limitation, we propose a second approach that uses an actor-critic deep reinforcement learning algorithm. This method learns policies directly within the GSPNR formalism and optimizes for the average reward criterion. We assess its performance in the same solar farm inspection scenario, and the results show that it outperforms proximal policy optimization methods. Moreover, it is capable of finding near-optimal solutions in models with state spaces five orders of magnitude larger than those tractable by the exact method.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"193 ","pages":"Article 105103"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0921889025002003","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Many real-world robotic applications, such as monitoring, inspection, and surveillance tasks, require effective multi-robot coordination over extended time horizons. These scenarios benefit from long-term planning and execution, and the ability to handle time uncertainty a priori significantly enhances efficiency in unpredictable environments. In this work, we introduce and compare two approaches for synthesizing coordination policies for multi-robot systems that account for time uncertainty and optimize performance over an infinite horizon. Both approaches are based on reasoning over a generalized stochastic Petri net with rewards (GSPNR) model and optimize the average reward criterion. The first approach is an exact method that provides formal guarantees on the synthesized policies and ensures convergence to the optimal policy. We evaluate this method in a solar farm inspection scenario, comparing its performance to discounted reward optimization methods and a carefully designed hand-crafted policy. The results demonstrate that, over the long term, the exact method outperforms these alternatives. However, its scalability is limited, as it cannot handle large state spaces. To address this limitation, we propose a second approach that uses an actor-critic deep reinforcement learning algorithm. This method learns policies directly within the GSPNR formalism and optimizes for the average reward criterion. We assess its performance in the same solar farm inspection scenario, and the results show that it outperforms proximal policy optimization methods. Moreover, it is capable of finding near-optimal solutions in models with state spaces five orders of magnitude larger than those tractable by the exact method.
期刊介绍:
Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems.
Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.