Language-Embedded 6D Pose Estimation for Tool Manipulation

IF 5.3 2区计算机科学 Q2 ROBOTICS

IEEE Robotics and Automation Letters Pub Date : 2025-07-10 DOI:10.1109/LRA.2025.3587559

Yuyang Tu;Yunlong Wang;Hui Zhang;Wenkai Chen;Jianwei Zhang

引用次数: 0

Abstract

Robotic tool manipulation requires understanding task-relevant semantics under visually challenging conditions, such as shape variation and occlusion. This paper presents a novel framework for Language-Embedded Semantic 6D Pose Estimation that combines natural language instructions with 3D point cloud data to achieve category-level 6D pose estimation of tools' functional parts. By embedding semantic information from large language models (LLMs) and leveraging a diffusion-based pose estimator, our approach achieves robust generalization across diverse tool categories. We introduce a comprehensive synthetic dataset, tailored for tool manipulation scenarios, with annotated 6D poses of functional parts. Extensive experiments conducted on both the synthetic dataset and real-world robots demonstrate our system's ability to interpret natural language commands, predict poses of functional parts, and perform manipulation tasks with significant improvements in accuracy and generalization.

查看原文本刊更多论文

工具操作的语言嵌入式6D姿态估计

机器人工具操作需要在视觉上具有挑战性的条件下理解任务相关的语义，例如形状变化和遮挡。本文提出了一种新的语言嵌入式语义6D姿态估计框架，将自然语言指令与三维点云数据相结合，实现了工具功能部件的类别级6D姿态估计。通过嵌入来自大型语言模型（llm）的语义信息并利用基于扩散的姿态估计器，我们的方法实现了跨不同工具类别的鲁棒泛化。我们引入了一个全面的合成数据集，为工具操作场景量身定制，带有功能部件的6D姿态注释。在合成数据集和现实世界的机器人上进行的大量实验表明，我们的系统能够解释自然语言命令，预测功能部件的姿势，并在准确性和泛化方面有显著提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Robotics and Automation Letters Computer Science-Computer Science Applications

CiteScore

9.60

自引率

15.40%

发文量

1428

期刊介绍： The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.