InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation

arXiv - CS - Robotics Pub Date : 2024-09-12 DOI:arxiv-2409.07914

Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani

引用次数: 0

Abstract

We present InterACT: Inter-dependency aware Action Chunking with Hierarchical Attention Transformers, a novel imitation learning framework for bimanual manipulation that integrates hierarchical attention to capture inter-dependencies between dual-arm joint states and visual inputs. InterACT consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both designed to enhance information aggregation and coordination. The encoder processes multi-modal inputs through segment-wise and cross-segment attention mechanisms, while the decoder leverages synchronization blocks to refine individual action predictions, providing the counterpart's prediction as context. Our experiments on a variety of simulated and real-world bimanual manipulation tasks demonstrate that InterACT significantly outperforms existing methods. Detailed ablation studies validate the contributions of key components of our work, including the impact of CLS tokens, cross-segment encoders, and synchronization blocks.

查看原文本刊更多论文

InterACT：利用分层注意力转换器进行相互依赖感知的动作分块，用于双手操作

我们介绍的 InterACT：分层注意力转换器（HierarchicalAttention Transformers）是一种用于双臂操纵的新型模仿学习框架，它整合了分层注意力，以捕捉双臂关节状态和视觉输入之间的相互依赖关系。InterACT 包括一个分层注意力编码器和一个多臂解码器，两者的设计都是为了加强信息聚合和协调。编码器通过分段和跨分段注意力机制处理多模态输入，而解码器则利用同步块来完善单个动作预测，并将对应方的预测作为上下文提供。我们在各种模拟和现实世界双臂操作任务中进行的实验表明，InterACT 的性能明显优于现有方法。详细的消融研究验证了我们工作中关键部分的贡献，包括 CLS 标记、跨片段编码器和同步块的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Robotics

自引率

0.00%

发文量