Hongliang Guo;Qi Kang;Wei-Yun Yau;Chee-Meng Chew;Daniela Rus
{"title":"R-FAC: Resilient Value Function Factorization for Multirobot Efficient Search With Individual Failure Probabilities","authors":"Hongliang Guo;Qi Kang;Wei-Yun Yau;Chee-Meng Chew;Daniela Rus","doi":"10.1109/TRO.2025.3567478","DOIUrl":null,"url":null,"abstract":"This article investigates the <italic>resilient</i> multirobot efficient search problem (R-MuRES), which aims at coordinating multiple robots to detect a “nonadversarial” moving target with the minimal expected time. One unique characteristic of R-MuRES among others is the possibility of individual robot's malfunction and withdrawal from the team during task execution, which results in a <italic>variable</i> number of searchers in the deployment phase and entails that the possibility of team member failures must be considered during the planning stage, particularly in the training phase. We propose a resilient value function factorization (R-FAC) paradigm, which constructs the central value function from individual ones in a resilient manner, taking into account individual robots' failures, and ensures that the constructed central value function has the minimal mean squared temporal difference error across various team compositions. R-FAC stipulates that the individual global maximum principle is satisfied for whichever team configuration and thus any functioning robot contributes positively to the remaining team, as long as it executes the greedy policy with respect to the factorized individual value function. Subsequently, we introduce the <italic>variational</i> value decomposition network (V2DN) as one of the instantiated R-FAC algorithms. V2DN employs the <inline-formula><tex-math>$\\log$</tex-math></inline-formula>-sum-<inline-formula><tex-math>$\\exp$</tex-math></inline-formula> mechanism to construct the central value function from individual ones, enabling it to take a varying number of robots' individual value functions as inputs. Then, we explain why, specifically for the multirobot search task, the <inline-formula><tex-math>$\\log$</tex-math></inline-formula>-sum-<inline-formula><tex-math>$\\exp$</tex-math></inline-formula> mechanism is superior to the brute-force summation operation used in the canonical value decomposition network (VDN), and compare V2DN with state-of-the-art MuRES solutions as well as the vanilla VDN algorithm in two canonical MuRES testing environments and show that it achieves the best resiliency score when one or several individual robots quit the team during task execution. Furthermore, we validate V2DN with a real multirobot system in a self-constructed indoor environment as the proof of concept.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"3385-3401"},"PeriodicalIF":9.4000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10989574/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0
Abstract
This article investigates the resilient multirobot efficient search problem (R-MuRES), which aims at coordinating multiple robots to detect a “nonadversarial” moving target with the minimal expected time. One unique characteristic of R-MuRES among others is the possibility of individual robot's malfunction and withdrawal from the team during task execution, which results in a variable number of searchers in the deployment phase and entails that the possibility of team member failures must be considered during the planning stage, particularly in the training phase. We propose a resilient value function factorization (R-FAC) paradigm, which constructs the central value function from individual ones in a resilient manner, taking into account individual robots' failures, and ensures that the constructed central value function has the minimal mean squared temporal difference error across various team compositions. R-FAC stipulates that the individual global maximum principle is satisfied for whichever team configuration and thus any functioning robot contributes positively to the remaining team, as long as it executes the greedy policy with respect to the factorized individual value function. Subsequently, we introduce the variational value decomposition network (V2DN) as one of the instantiated R-FAC algorithms. V2DN employs the $\log$-sum-$\exp$ mechanism to construct the central value function from individual ones, enabling it to take a varying number of robots' individual value functions as inputs. Then, we explain why, specifically for the multirobot search task, the $\log$-sum-$\exp$ mechanism is superior to the brute-force summation operation used in the canonical value decomposition network (VDN), and compare V2DN with state-of-the-art MuRES solutions as well as the vanilla VDN algorithm in two canonical MuRES testing environments and show that it achieves the best resiliency score when one or several individual robots quit the team during task execution. Furthermore, we validate V2DN with a real multirobot system in a self-constructed indoor environment as the proof of concept.
期刊介绍:
The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles.
Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.