{"title":"FreeMask3D:零射击点云实例分割没有3D训练","authors":"Mingquan Zhou;Xiaodong Wu;Chen He;Ruiping Wang;Xilin Chen","doi":"10.1109/LRA.2025.3621977","DOIUrl":null,"url":null,"abstract":"Point cloud instance segmentation is crucial for 3D scene understanding in robotics. However, existing methods heavily rely on learning-based approaches that require large amounts of annotated 3D data, resulting in high annotation costs. Therefore, developing cost-effective and data-efficient solutions is essential. To this end, we propose FreeMask3D, a novel approach that achieves 3D point cloud instance segmentation without requiring any 3D annotation or additional training. Our method consists of two main steps: instance localization and instance recognition. For instance localization, we leverage pre-trained 2D instance segmentation models to perform instance segmentation on corresponding RGB-D images. These results are then mapped to 3D space and fused across frames to generate the final 3D instance masks. For instance recognition, the OpenSem module infers the category of each instance by leveraging the generalization capabilities of cross-modal large models, such as CLIP, to enable open-vocabulary semantic recognition. Experiments and ablation studies on four challenging benchmarks—ScanNetv2, ScanNet200, S3DIS, and Replica—demonstrate that FreeMask3D achieves competitive or superior performance compared to state-of-the-art methods, despite without 3D supervision. Qualitative results highlight its open-vocabulary capabilities based on color, affordance, or uncommon phrase description.","PeriodicalId":13241,"journal":{"name":"IEEE Robotics and Automation Letters","volume":"10 12","pages":"12301-12308"},"PeriodicalIF":5.3000,"publicationDate":"2025-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FreeMask3D: Zero-Shot Point Cloud Instance Segmentation Without 3D Training\",\"authors\":\"Mingquan Zhou;Xiaodong Wu;Chen He;Ruiping Wang;Xilin Chen\",\"doi\":\"10.1109/LRA.2025.3621977\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Point cloud instance segmentation is crucial for 3D scene understanding in robotics. However, existing methods heavily rely on learning-based approaches that require large amounts of annotated 3D data, resulting in high annotation costs. Therefore, developing cost-effective and data-efficient solutions is essential. To this end, we propose FreeMask3D, a novel approach that achieves 3D point cloud instance segmentation without requiring any 3D annotation or additional training. Our method consists of two main steps: instance localization and instance recognition. For instance localization, we leverage pre-trained 2D instance segmentation models to perform instance segmentation on corresponding RGB-D images. These results are then mapped to 3D space and fused across frames to generate the final 3D instance masks. For instance recognition, the OpenSem module infers the category of each instance by leveraging the generalization capabilities of cross-modal large models, such as CLIP, to enable open-vocabulary semantic recognition. Experiments and ablation studies on four challenging benchmarks—ScanNetv2, ScanNet200, S3DIS, and Replica—demonstrate that FreeMask3D achieves competitive or superior performance compared to state-of-the-art methods, despite without 3D supervision. Qualitative results highlight its open-vocabulary capabilities based on color, affordance, or uncommon phrase description.\",\"PeriodicalId\":13241,\"journal\":{\"name\":\"IEEE Robotics and Automation Letters\",\"volume\":\"10 12\",\"pages\":\"12301-12308\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-10-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Robotics and Automation Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11203973/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ROBOTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Robotics and Automation Letters","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11203973/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}
FreeMask3D: Zero-Shot Point Cloud Instance Segmentation Without 3D Training
Point cloud instance segmentation is crucial for 3D scene understanding in robotics. However, existing methods heavily rely on learning-based approaches that require large amounts of annotated 3D data, resulting in high annotation costs. Therefore, developing cost-effective and data-efficient solutions is essential. To this end, we propose FreeMask3D, a novel approach that achieves 3D point cloud instance segmentation without requiring any 3D annotation or additional training. Our method consists of two main steps: instance localization and instance recognition. For instance localization, we leverage pre-trained 2D instance segmentation models to perform instance segmentation on corresponding RGB-D images. These results are then mapped to 3D space and fused across frames to generate the final 3D instance masks. For instance recognition, the OpenSem module infers the category of each instance by leveraging the generalization capabilities of cross-modal large models, such as CLIP, to enable open-vocabulary semantic recognition. Experiments and ablation studies on four challenging benchmarks—ScanNetv2, ScanNet200, S3DIS, and Replica—demonstrate that FreeMask3D achieves competitive or superior performance compared to state-of-the-art methods, despite without 3D supervision. Qualitative results highlight its open-vocabulary capabilities based on color, affordance, or uncommon phrase description.
期刊介绍:
The scope of this journal is to publish peer-reviewed articles that provide a timely and concise account of innovative research ideas and application results, reporting significant theoretical findings and application case studies in areas of robotics and automation.