InterACT:利用分层注意力转换器进行相互依赖感知的动作分块,用于双手操作

Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani
{"title":"InterACT:利用分层注意力转换器进行相互依赖感知的动作分块,用于双手操作","authors":"Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani","doi":"arxiv-2409.07914","DOIUrl":null,"url":null,"abstract":"We present InterACT: Inter-dependency aware Action Chunking with Hierarchical\nAttention Transformers, a novel imitation learning framework for bimanual\nmanipulation that integrates hierarchical attention to capture\ninter-dependencies between dual-arm joint states and visual inputs. InterACT\nconsists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both\ndesigned to enhance information aggregation and coordination. The encoder\nprocesses multi-modal inputs through segment-wise and cross-segment attention\nmechanisms, while the decoder leverages synchronization blocks to refine\nindividual action predictions, providing the counterpart's prediction as\ncontext. Our experiments on a variety of simulated and real-world bimanual\nmanipulation tasks demonstrate that InterACT significantly outperforms existing\nmethods. Detailed ablation studies validate the contributions of key components\nof our work, including the impact of CLS tokens, cross-segment encoders, and\nsynchronization blocks.","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation\",\"authors\":\"Andrew Lee, Ian Chuang, Ling-Yuan Chen, Iman Soltani\",\"doi\":\"arxiv-2409.07914\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present InterACT: Inter-dependency aware Action Chunking with Hierarchical\\nAttention Transformers, a novel imitation learning framework for bimanual\\nmanipulation that integrates hierarchical attention to capture\\ninter-dependencies between dual-arm joint states and visual inputs. InterACT\\nconsists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both\\ndesigned to enhance information aggregation and coordination. The encoder\\nprocesses multi-modal inputs through segment-wise and cross-segment attention\\nmechanisms, while the decoder leverages synchronization blocks to refine\\nindividual action predictions, providing the counterpart's prediction as\\ncontext. Our experiments on a variety of simulated and real-world bimanual\\nmanipulation tasks demonstrate that InterACT significantly outperforms existing\\nmethods. Detailed ablation studies validate the contributions of key components\\nof our work, including the impact of CLS tokens, cross-segment encoders, and\\nsynchronization blocks.\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07914\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07914","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

我们介绍的 InterACT:分层注意力转换器(HierarchicalAttention Transformers)是一种用于双臂操纵的新型模仿学习框架,它整合了分层注意力,以捕捉双臂关节状态和视觉输入之间的相互依赖关系。InterACT 包括一个分层注意力编码器和一个多臂解码器,两者的设计都是为了加强信息聚合和协调。编码器通过分段和跨分段注意力机制处理多模态输入,而解码器则利用同步块来完善单个动作预测,并将对应方的预测作为上下文提供。我们在各种模拟和现实世界双臂操作任务中进行的实验表明,InterACT 的性能明显优于现有方法。详细的消融研究验证了我们工作中关键部分的贡献,包括 CLS 标记、跨片段编码器和同步块的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
InterACT: Inter-dependency Aware Action Chunking with Hierarchical Attention Transformers for Bimanual Manipulation
We present InterACT: Inter-dependency aware Action Chunking with Hierarchical Attention Transformers, a novel imitation learning framework for bimanual manipulation that integrates hierarchical attention to capture inter-dependencies between dual-arm joint states and visual inputs. InterACT consists of a Hierarchical Attention Encoder and a Multi-arm Decoder, both designed to enhance information aggregation and coordination. The encoder processes multi-modal inputs through segment-wise and cross-segment attention mechanisms, while the decoder leverages synchronization blocks to refine individual action predictions, providing the counterpart's prediction as context. Our experiments on a variety of simulated and real-world bimanual manipulation tasks demonstrate that InterACT significantly outperforms existing methods. Detailed ablation studies validate the contributions of key components of our work, including the impact of CLS tokens, cross-segment encoders, and synchronization blocks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信