利用胶囊网络和注意机制学习多机器人任务分配

IF 5.2 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

Robotics and Autonomous Systems Pub Date : 2025-06-23 DOI:10.1016/j.robot.2025.105085

Steve Paul, Souma Chowdhury

{"title":"利用胶囊网络和注意机制学习多机器人任务分配","authors":"Steve Paul, Souma Chowdhury","doi":"10.1016/j.robot.2025.105085","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a new graph reinforcement learning (RL) architecture to solve multi-robot task allocation (MRTA) problems without requiring any tedious heuristics. Multi-feature tasks are abstracted as nodes in an undirected graph in this case. The primary goal is to not only generalize across unseen problems of similar size but also scale to problems with much larger task spaces without retraining; which otherwise could be particularly expensive when simulating multi-robot operations. While drawing inspiration from the emerging paradigm in learning to solve combinatorial optimization (CO) problems, a new encoder–decoder architecture called Capsule Attention-based Mechanism or CAPAM is presented here to achieve this goal. More specifically, a novel choice of <em>encoder</em> is made in the form of graph capsule convolutional networks, which enables permutation invariant embeddings that capture the local and global structure of the task graph by using higher-order statistical moments of the vectors of node features. This encoded information is combined with a <em>context</em> component encoding mission and robot states, and processed through the <em>decoder</em> that computes the probability of selecting different available tasks by a robot. To train the CAPAM model, a policy-gradient method based on Proximal Policy Optimization is used. When evaluated over unseen scenarios, CAPAM demonstrates comparable task completion performance and faster decision-making compared to standard non-learning-based online MRTA methods. CAPAM demonstrates substantial gains in generalizability and (task) scalability in comparison to a popular approach for learning to solve CO problems (the pure attention mechanism) and preserves this performance advantage even under partial observation scenarios.</div></div>","PeriodicalId":49592,"journal":{"name":"Robotics and Autonomous Systems","volume":"193 ","pages":"Article 105085"},"PeriodicalIF":5.2000,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning multi-robot task allocation using capsule networks and attention mechanism\",\"authors\":\"Steve Paul, Souma Chowdhury\",\"doi\":\"10.1016/j.robot.2025.105085\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a new graph reinforcement learning (RL) architecture to solve multi-robot task allocation (MRTA) problems without requiring any tedious heuristics. Multi-feature tasks are abstracted as nodes in an undirected graph in this case. The primary goal is to not only generalize across unseen problems of similar size but also scale to problems with much larger task spaces without retraining; which otherwise could be particularly expensive when simulating multi-robot operations. While drawing inspiration from the emerging paradigm in learning to solve combinatorial optimization (CO) problems, a new encoder–decoder architecture called Capsule Attention-based Mechanism or CAPAM is presented here to achieve this goal. More specifically, a novel choice of <em>encoder</em> is made in the form of graph capsule convolutional networks, which enables permutation invariant embeddings that capture the local and global structure of the task graph by using higher-order statistical moments of the vectors of node features. This encoded information is combined with a <em>context</em> component encoding mission and robot states, and processed through the <em>decoder</em> that computes the probability of selecting different available tasks by a robot. To train the CAPAM model, a policy-gradient method based on Proximal Policy Optimization is used. When evaluated over unseen scenarios, CAPAM demonstrates comparable task completion performance and faster decision-making compared to standard non-learning-based online MRTA methods. CAPAM demonstrates substantial gains in generalizability and (task) scalability in comparison to a popular approach for learning to solve CO problems (the pure attention mechanism) and preserves this performance advantage even under partial observation scenarios.</div></div>\",\"PeriodicalId\":49592,\"journal\":{\"name\":\"Robotics and Autonomous Systems\",\"volume\":\"193 \",\"pages\":\"Article 105085\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-06-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Robotics and Autonomous Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S092188902500171X\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Autonomous Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S092188902500171X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种新的图强化学习（RL）架构来解决多机器人任务分配（MRTA）问题，而不需要任何繁琐的启发式算法。在这种情况下，多特征任务被抽象为无向图中的节点。主要目标不仅是在类似规模的未见问题上进行泛化，而且还可以扩展到具有更大任务空间的问题，而无需再训练；否则在模拟多机器人操作时可能会特别昂贵。在从学习解决组合优化（CO）问题的新兴范例中汲取灵感的同时，本文提出了一种新的编码器-解码器架构，称为Capsule Attention-based Mechanism或CAPAM，以实现这一目标。更具体地说，编码器以图胶囊卷积网络的形式进行了新颖的选择，它允许排列不变嵌入，通过使用节点特征向量的高阶统计矩来捕获任务图的局部和全局结构。该编码信息与上下文组件编码任务和机器人状态相结合，并通过解码器进行处理，解码器计算机器人选择不同可用任务的概率。为了训练CAPAM模型，采用了一种基于近端策略优化的策略梯度方法。当对未知场景进行评估时，与标准的非基于学习的在线MRTA方法相比，CAPAM显示出可比的任务完成性能和更快的决策速度。与一种流行的学习解决CO问题（纯注意力机制）的方法相比，CAPAM在通用性和（任务）可伸缩性方面取得了实质性的进展，并且即使在部分观察场景下也保持了这种性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning multi-robot task allocation using capsule networks and attention mechanism

This paper presents a new graph reinforcement learning (RL) architecture to solve multi-robot task allocation (MRTA) problems without requiring any tedious heuristics. Multi-feature tasks are abstracted as nodes in an undirected graph in this case. The primary goal is to not only generalize across unseen problems of similar size but also scale to problems with much larger task spaces without retraining; which otherwise could be particularly expensive when simulating multi-robot operations. While drawing inspiration from the emerging paradigm in learning to solve combinatorial optimization (CO) problems, a new encoder–decoder architecture called Capsule Attention-based Mechanism or CAPAM is presented here to achieve this goal. More specifically, a novel choice of encoder is made in the form of graph capsule convolutional networks, which enables permutation invariant embeddings that capture the local and global structure of the task graph by using higher-order statistical moments of the vectors of node features. This encoded information is combined with a context component encoding mission and robot states, and processed through the decoder that computes the probability of selecting different available tasks by a robot. To train the CAPAM model, a policy-gradient method based on Proximal Policy Optimization is used. When evaluated over unseen scenarios, CAPAM demonstrates comparable task completion performance and faster decision-making compared to standard non-learning-based online MRTA methods. CAPAM demonstrates substantial gains in generalizability and (task) scalability in comparison to a popular approach for learning to solve CO problems (the pure attention mechanism) and preserves this performance advantage even under partial observation scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Robotics and Autonomous Systems 工程技术-机器人学

CiteScore

9.00

自引率

7.00%

发文量

164

审稿时长

4.5 months

期刊介绍： Robotics and Autonomous Systems will carry articles describing fundamental developments in the field of robotics, with special emphasis on autonomous systems. An important goal of this journal is to extend the state of the art in both symbolic and sensory based robot control and learning in the context of autonomous systems. Robotics and Autonomous Systems will carry articles on the theoretical, computational and experimental aspects of autonomous systems, or modules of such systems.