R-FAC: Resilient Value Function Factorization for Multirobot Efficient Search With Individual Failure Probabilities

IF 9.4 1区计算机科学 Q1 ROBOTICS

IEEE Transactions on Robotics Pub Date : 2025-03-06 DOI:10.1109/TRO.2025.3567478

Hongliang Guo;Qi Kang;Wei-Yun Yau;Chee-Meng Chew;Daniela Rus

{"title":"R-FAC: Resilient Value Function Factorization for Multirobot Efficient Search With Individual Failure Probabilities","authors":"Hongliang Guo;Qi Kang;Wei-Yun Yau;Chee-Meng Chew;Daniela Rus","doi":"10.1109/TRO.2025.3567478","DOIUrl":null,"url":null,"abstract":"This article investigates the <italic>resilient</i> multirobot efficient search problem (R-MuRES), which aims at coordinating multiple robots to detect a “nonadversarial” moving target with the minimal expected time. One unique characteristic of R-MuRES among others is the possibility of individual robot's malfunction and withdrawal from the team during task execution, which results in a <italic>variable</i> number of searchers in the deployment phase and entails that the possibility of team member failures must be considered during the planning stage, particularly in the training phase. We propose a resilient value function factorization (R-FAC) paradigm, which constructs the central value function from individual ones in a resilient manner, taking into account individual robots' failures, and ensures that the constructed central value function has the minimal mean squared temporal difference error across various team compositions. R-FAC stipulates that the individual global maximum principle is satisfied for whichever team configuration and thus any functioning robot contributes positively to the remaining team, as long as it executes the greedy policy with respect to the factorized individual value function. Subsequently, we introduce the <italic>variational</i> value decomposition network (V2DN) as one of the instantiated R-FAC algorithms. V2DN employs the <inline-formula><tex-math>$\\log$</tex-math></inline-formula>-sum-<inline-formula><tex-math>$\\exp$</tex-math></inline-formula> mechanism to construct the central value function from individual ones, enabling it to take a varying number of robots' individual value functions as inputs. Then, we explain why, specifically for the multirobot search task, the <inline-formula><tex-math>$\\log$</tex-math></inline-formula>-sum-<inline-formula><tex-math>$\\exp$</tex-math></inline-formula> mechanism is superior to the brute-force summation operation used in the canonical value decomposition network (VDN), and compare V2DN with state-of-the-art MuRES solutions as well as the vanilla VDN algorithm in two canonical MuRES testing environments and show that it achieves the best resiliency score when one or several individual robots quit the team during task execution. Furthermore, we validate V2DN with a real multirobot system in a self-constructed indoor environment as the proof of concept.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"3385-3401"},"PeriodicalIF":9.4000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10989574/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

This article investigates the resilient multirobot efficient search problem (R-MuRES), which aims at coordinating multiple robots to detect a “nonadversarial” moving target with the minimal expected time. One unique characteristic of R-MuRES among others is the possibility of individual robot's malfunction and withdrawal from the team during task execution, which results in a variable number of searchers in the deployment phase and entails that the possibility of team member failures must be considered during the planning stage, particularly in the training phase. We propose a resilient value function factorization (R-FAC) paradigm, which constructs the central value function from individual ones in a resilient manner, taking into account individual robots' failures, and ensures that the constructed central value function has the minimal mean squared temporal difference error across various team compositions. R-FAC stipulates that the individual global maximum principle is satisfied for whichever team configuration and thus any functioning robot contributes positively to the remaining team, as long as it executes the greedy policy with respect to the factorized individual value function. Subsequently, we introduce the variational value decomposition network (V2DN) as one of the instantiated R-FAC algorithms. V2DN employs the

$\log$

-sum-

$\exp$

mechanism to construct the central value function from individual ones, enabling it to take a varying number of robots' individual value functions as inputs. Then, we explain why, specifically for the multirobot search task, the

$\log$

-sum-

$\exp$

mechanism is superior to the brute-force summation operation used in the canonical value decomposition network (VDN), and compare V2DN with state-of-the-art MuRES solutions as well as the vanilla VDN algorithm in two canonical MuRES testing environments and show that it achieves the best resiliency score when one or several individual robots quit the team during task execution. Furthermore, we validate V2DN with a real multirobot system in a self-constructed indoor environment as the proof of concept.

查看原文本刊更多论文

具有单个故障概率的多机器人高效搜索的弹性值函数分解

本文研究了弹性多机器人高效搜索问题（R-MuRES），该问题旨在协调多个机器人以最小的期望时间检测“非对抗性”运动目标。R-MuRES的一个独特的特点是，在任务执行过程中，单个机器人出现故障并退出团队的可能性，这导致部署阶段的搜索者数量可变，并且在计划阶段，特别是在训练阶段，必须考虑团队成员失败的可能性。我们提出了一种弹性价值函数分解（R-FAC）范式，该范式以弹性的方式从个体价值函数构建中心价值函数，考虑到个体机器人的故障，并确保构建的中心价值函数在不同团队组成中具有最小的均方时间差误差。R-FAC规定无论哪种团队构型都满足个体全局最大值原则，因此任何一个正常工作的机器人，只要对分解后的个体价值函数执行贪心策略，对剩余的团队都有积极的贡献。随后，我们引入了变分值分解网络（V2DN）作为实例化的R-FAC算法之一。V2DN采用$\log$ -sum- $\exp$机制从个体价值函数构建中心价值函数，使其能够将不同数量的机器人个体价值函数作为输入。然后，我们解释了为什么，特别是对于多机器人搜索任务，$\log$ -sum- $\exp$机制优于规范值分解网络（VDN）中使用的暴力求和操作，并将V2DN与最先进的MuRES解决方案以及两个规范MuRES测试环境中的vanilla VDN算法进行了比较，并表明当一个或多个个体机器人在任务执行过程中退出团队时，它获得了最佳的弹性得分。此外，我们用一个真实的多机器人系统在自建的室内环境中验证了V2DN作为概念验证。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Robotics 工程技术-机器人学

CiteScore

14.90

自引率

5.10%

发文量

259

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles. Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.