FreeMask3D: Zero-Shot Point Cloud Instance Segmentation Without 3D Training

IF 5.3 2区 计算机科学 Q2 ROBOTICS
Mingquan Zhou;Xiaodong Wu;Chen He;Ruiping Wang;Xilin Chen
{"title":"FreeMask3D: Zero-Shot Point Cloud Instance Segmentation Without 3D Training","authors":"Mingquan Zhou;Xiaodong Wu;Chen He;Ruiping Wang;Xilin Chen","doi":"10.1109/LRA.2025.3621977","DOIUrl":null,"url":null,"abstract":"Point cloud instance segmentation is crucial for 3D scene understanding in robotics. However, existing methods heavily rely on learning-based approaches that require large amounts of annotated 3D data, resulting in high annotation costs. Therefore, developing cost-effective and data-efficient solutions is essential. To this end, we propose FreeMask3D, a novel approach that achieves 3D point cloud instance segmentation without requiring any 3D annotation or additional training. Our method consists of two main steps: instance localization and instance recognition. For instance localization, we leverage pre-trained 2D instance segmentation models to perform instance segmentation on corresponding RGB-D images. These results are then mapped to 3D space and fused across frames to generate the final 3D instance masks. For instance recognition, the OpenSem module infers the category of each instance by leveraging the generalization capabilities of cross-modal large models, such as CLIP, to enable open-vocabulary semantic recognition. Experiments and ablation studies on four challenging benchmarks—ScanNetv2, ScanNet200, S3DIS, and Replica—demonstrate that FreeMask3D achieves competitive or superior performance compared to state-of-the-art methods, despite without 3D supervision. Qualitative results highlight its open-vocabulary capabilities based on color, affordance, or uncommon phrase description.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 12","pages":"12301-12308"},"PeriodicalIF":5.3000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11203973/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
引用次数: 0

Abstract

Point cloud instance segmentation is crucial for 3D scene understanding in robotics. However, existing methods heavily rely on learning-based approaches that require large amounts of annotated 3D data, resulting in high annotation costs. Therefore, developing cost-effective and data-efficient solutions is essential. To this end, we propose FreeMask3D, a novel approach that achieves 3D point cloud instance segmentation without requiring any 3D annotation or additional training. Our method consists of two main steps: instance localization and instance recognition. For instance localization, we leverage pre-trained 2D instance segmentation models to perform instance segmentation on corresponding RGB-D images. These results are then mapped to 3D space and fused across frames to generate the final 3D instance masks. For instance recognition, the OpenSem module infers the category of each instance by leveraging the generalization capabilities of cross-modal large models, such as CLIP, to enable open-vocabulary semantic recognition. Experiments and ablation studies on four challenging benchmarks—ScanNetv2, ScanNet200, S3DIS, and Replica—demonstrate that FreeMask3D achieves competitive or superior performance compared to state-of-the-art methods, despite without 3D supervision. Qualitative results highlight its open-vocabulary capabilities based on color, affordance, or uncommon phrase description.
FreeMask3D:零射击点云实例分割没有3D训练
点云实例分割是机器人技术中三维场景理解的关键。然而,现有的方法严重依赖于基于学习的方法,这些方法需要大量的注释3D数据,导致注释成本很高。因此,开发具有成本效益和数据效率的解决方案至关重要。为此,我们提出了FreeMask3D,这是一种新颖的方法,无需任何3D注释或额外的训练即可实现3D点云实例分割。我们的方法包括两个主要步骤:实例定位和实例识别。对于实例定位,我们利用预训练的2D实例分割模型对相应的RGB-D图像执行实例分割。然后将这些结果映射到3D空间并跨帧融合以生成最终的3D实例掩码。例如实例识别,OpenSem模块通过利用跨模态大型模型(如CLIP)的泛化功能来推断每个实例的类别,以启用开放词汇表语义识别。在四个具有挑战性的基准(scannetv2、ScanNet200、S3DIS和replica)上进行的实验和烧烧研究表明,与最先进的方法相比,FreeMask3D在没有3D监督的情况下也具有竞争力或更高的性能。定性结果突出了其基于颜色、可视性或不常见短语描述的开放词汇能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Robotics and Automation Letters
IEEE Robotics and Automation Letters Computer Science-Computer Science Applications
CiteScore
9.60
自引率
15.40%
发文量
1428
期刊介绍: The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信