A Multilevel Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning

IF 10.5 1区计算机科学 Q1 ROBOTICS

IEEE Transactions on Robotics Pub Date : 2025-07-14 DOI:10.1109/TRO.2025.3588720

Hao Chen;Takuya Kiyokawa;Zhengtao Hu;Weiwei Wan;Kensuke Harada

{"title":"A Multilevel Similarity Approach for Single-View Object Grasping: Matching, Planning, and Fine-Tuning","authors":"Hao Chen;Takuya Kiyokawa;Zhengtao Hu;Weiwei Wan;Kensuke Harada","doi":"10.1109/TRO.2025.3588720","DOIUrl":null,"url":null,"abstract":"Grasping unknown objects from a single view has remained a challenging topic in robotics due to the uncertainty of partial observation. Recent advances in large-scale models have led to benchmark solutions such as GraspNet-1Billion. However, such learning-based approaches still face a critical limitation in performance robustness for their sensitivity to sensing noise and environmental changes. To address this bottleneck in achieving highly generalized grasping, we abandon the traditional learning framework and introduce a new perspective: similarity matching, where similar known objects are utilized to guide the grasping of unknown target objects. We newly propose a method that robustly achieves unknown-object grasping from a single viewpoint through three key steps: 1) leverage the visual features of the observed object to perform similarity matching with an existing database containing various object models, identifying potential candidates with high similarity; 2) use the candidate models with pre-existing grasping knowledge to plan imitative grasps for the unknown target object; 3) optimize the grasp quality through a local fine-tuning process. To address the uncertainty caused by partial and noisy observation, we propose a multilevel similarity matching framework that integrates semantic, geometric, and dimensional features for comprehensive evaluation. Especially, we introduce a novel point cloud geometric descriptor, the clustered fast point feature histogram descriptor, which facilitates accurate similarity assessment between partial point clouds of observed objects and complete point clouds of database models. In addition, we incorporate the use of large language models, introduce the semioriented bounding box, and develop a novel point cloud registration approach based on plane detection to enhance matching accuracy under single-view conditions. Real-world experiments demonstrate that our proposed method significantly outperforms existing benchmarks in grasping a wide variety of unknown objects in both isolated and cluttered scenarios, showcasing exceptional robustness across varying object types and operating environments.","PeriodicalId":50388,"journal":{"name":"IEEE Transactions on Robotics","volume":"41 ","pages":"500-519"},"PeriodicalIF":10.5000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Robotics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11079240/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

Grasping unknown objects from a single view has remained a challenging topic in robotics due to the uncertainty of partial observation. Recent advances in large-scale models have led to benchmark solutions such as GraspNet-1Billion. However, such learning-based approaches still face a critical limitation in performance robustness for their sensitivity to sensing noise and environmental changes. To address this bottleneck in achieving highly generalized grasping, we abandon the traditional learning framework and introduce a new perspective: similarity matching, where similar known objects are utilized to guide the grasping of unknown target objects. We newly propose a method that robustly achieves unknown-object grasping from a single viewpoint through three key steps: 1) leverage the visual features of the observed object to perform similarity matching with an existing database containing various object models, identifying potential candidates with high similarity; 2) use the candidate models with pre-existing grasping knowledge to plan imitative grasps for the unknown target object; 3) optimize the grasp quality through a local fine-tuning process. To address the uncertainty caused by partial and noisy observation, we propose a multilevel similarity matching framework that integrates semantic, geometric, and dimensional features for comprehensive evaluation. Especially, we introduce a novel point cloud geometric descriptor, the clustered fast point feature histogram descriptor, which facilitates accurate similarity assessment between partial point clouds of observed objects and complete point clouds of database models. In addition, we incorporate the use of large language models, introduce the semioriented bounding box, and develop a novel point cloud registration approach based on plane detection to enhance matching accuracy under single-view conditions. Real-world experiments demonstrate that our proposed method significantly outperforms existing benchmarks in grasping a wide variety of unknown objects in both isolated and cluttered scenarios, showcasing exceptional robustness across varying object types and operating environments.

查看原文本刊更多论文

单视图对象抓取的多层次相似度方法：匹配、规划和微调

由于局部观察的不确定性，从单一视角抓取未知物体一直是机器人技术中的一个具有挑战性的课题。大规模模型的最新进展导致了诸如graspnet - 10亿之类的基准解决方案。然而，这种基于学习的方法仍然面临着性能鲁棒性的关键限制，因为它们对感知噪声和环境变化的敏感性。为了解决这一瓶颈，我们放弃了传统的学习框架，引入了一种新的视角：相似性匹配，即利用相似的已知物体来指导未知目标物体的抓取。本文提出了一种从单一视点稳健地实现未知物体抓取的方法，该方法通过三个关键步骤：1)利用观察到的物体的视觉特征与现有的包含各种物体模型的数据库进行相似性匹配，识别出具有高相似性的潜在候选物体；2)利用已有抓取知识的候选模型对未知目标物体进行拟抓取；3)通过局部微调过程优化抓握质量。为了解决部分观测和噪声观测引起的不确定性，我们提出了一个集成语义、几何和维度特征的多层次相似性匹配框架，以进行综合评估。特别地，我们引入了一种新的点云几何描述符——聚类快速点特征直方图描述符，该描述符有助于准确评估观测对象的部分点云与数据库模型的完整点云之间的相似性。此外，我们还结合了大型语言模型的使用，引入了半定向边界框，并开发了一种基于平面检测的点云配准方法，以提高单视图条件下的匹配精度。现实世界的实验表明，我们提出的方法在孤立和混乱的场景中都明显优于现有的基准，在不同的对象类型和操作环境中表现出卓越的鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Robotics 工程技术-机器人学

CiteScore

14.90

自引率

5.10%

发文量

259

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Robotics (T-RO) is dedicated to publishing fundamental papers covering all facets of robotics, drawing on interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, and beyond. From industrial applications to service and personal assistants, surgical operations to space, underwater, and remote exploration, robots and intelligent machines play pivotal roles across various domains, including entertainment, safety, search and rescue, military applications, agriculture, and intelligent vehicles. Special emphasis is placed on intelligent machines and systems designed for unstructured environments, where a significant portion of the environment remains unknown and beyond direct sensing or control.