High-Quality Pseudo-Labeling for Point Cloud Segmentation With Scene-Level Annotation

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-06-25 DOI:10.1109/TPAMI.2025.3583071

Lunhao Duan;Shanshan Zhao;Xingxing Weng;Jing Zhang;Gui-Song Xia

{"title":"High-Quality Pseudo-Labeling for Point Cloud Segmentation With Scene-Level Annotation","authors":"Lunhao Duan;Shanshan Zhao;Xingxing Weng;Jing Zhang;Gui-Song Xia","doi":"10.1109/TPAMI.2025.3583071","DOIUrl":null,"url":null,"abstract":"This paper investigates indoor point cloud semantic segmentation under scene-level annotation, which is less explored compared to methods relying on sparse point-level labels. In the absence of precise point-level labels, current methods first generate point-level pseudo-labels, which are then used to train segmentation models. However, generating accurate pseudo-labels for each point solely based on scene-level annotations poses a considerable challenge, substantially affecting segmentation performance. Consequently, to enhance accuracy, this paper proposes a high-quality pseudo-label generation framework by exploring contemporary multi-modal information and region-point semantic consistency. Specifically, with a cross-modal feature guidance module, our method utilizes 2D-3D correspondences to align point cloud features with corresponding 2D image pixels, thereby assisting point cloud feature learning. To further alleviate the challenge presented by the scene-level annotation, we introduce a region-point semantic consistency module. It produces regional semantics through a region-voting strategy derived from point-level semantics, which are subsequently employed to guide the point-level semantic predictions. Leveraging the aforementioned modules, our method can rectify inaccurate point-level semantic predictions during training and obtain high-quality pseudo-labels. Significant improvements over previous works on ScanNet v2 and S3DIS datasets under scene-level annotation can demonstrate the effectiveness. Additionally, comprehensive ablation studies validate the contributions of our approach’s individual components.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 10","pages":"9360-9366"},"PeriodicalIF":18.6000,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11050997/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper investigates indoor point cloud semantic segmentation under scene-level annotation, which is less explored compared to methods relying on sparse point-level labels. In the absence of precise point-level labels, current methods first generate point-level pseudo-labels, which are then used to train segmentation models. However, generating accurate pseudo-labels for each point solely based on scene-level annotations poses a considerable challenge, substantially affecting segmentation performance. Consequently, to enhance accuracy, this paper proposes a high-quality pseudo-label generation framework by exploring contemporary multi-modal information and region-point semantic consistency. Specifically, with a cross-modal feature guidance module, our method utilizes 2D-3D correspondences to align point cloud features with corresponding 2D image pixels, thereby assisting point cloud feature learning. To further alleviate the challenge presented by the scene-level annotation, we introduce a region-point semantic consistency module. It produces regional semantics through a region-voting strategy derived from point-level semantics, which are subsequently employed to guide the point-level semantic predictions. Leveraging the aforementioned modules, our method can rectify inaccurate point-level semantic predictions during training and obtain high-quality pseudo-labels. Significant improvements over previous works on ScanNet v2 and S3DIS datasets under scene-level annotation can demonstrate the effectiveness. Additionally, comprehensive ablation studies validate the contributions of our approach’s individual components.

查看原文本刊更多论文

基于场景级标注的点云分割高质量伪标记。

本文研究了场景级标注下的室内点云语义分割，与依赖稀疏点级标签的方法相比，这方面的研究较少。在缺乏精确的点级标签的情况下，目前的方法首先生成点级伪标签，然后使用伪标签来训练分割模型。然而，仅基于场景级注释为每个点生成准确的伪标签带来了相当大的挑战，极大地影响了分割性能。因此，为了提高准确性，本文通过探索当代多模态信息和区域点语义一致性，提出了一个高质量的伪标签生成框架。具体来说，我们的方法通过一个跨模态特征引导模块，利用2D- 3d对应关系将点云特征与相应的二维图像像素对齐，从而辅助点云特征学习。为了进一步缓解场景级标注带来的挑战，我们引入了区域点语义一致性模块。它通过从点级语义派生的区域投票策略产生区域语义，然后使用区域语义来指导点级语义预测。利用上述模块，我们的方法可以在训练过程中纠正不准确的点级语义预测，并获得高质量的伪标签。在场景级标注下，对ScanNet v2和S3DIS数据集的显著改进可以证明其有效性。此外，综合消融研究证实了我们的方法的各个组成部分的贡献。代码可在https://github.com/LHDuan/WSegPC上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量