WSSIC-Net: Weakly-Supervised Semantic Instance Completion of 3D Point Cloud Scenes

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-03-27 DOI:10.1109/TIP.2024.3520013

Zhiheng Fu;Yulan Guo;Minglin Chen;Qingyong Hu;Hamid Laga;Farid Boussaid;Mohammed Bennamoun

{"title":"WSSIC-Net: Weakly-Supervised Semantic Instance Completion of 3D Point Cloud Scenes","authors":"Zhiheng Fu;Yulan Guo;Minglin Chen;Qingyong Hu;Hamid Laga;Farid Boussaid;Mohammed Bennamoun","doi":"10.1109/TIP.2024.3520013","DOIUrl":null,"url":null,"abstract":"Semantic instance completion aims to recover the complete 3D shapes of foreground objects together with their labels from a partial 2.5D scan of a scene. Previous works have relied on full supervision, which requires ground-truth annotations, in the form of bounding boxes and complete 3D objects. This has greatly limited their real-world application because the acquisition of ground-truth data is very costly and time-consuming. To address this bottleneck, we propose a Weakly-Supervised Semantic Instance Completion Network (WSSIC-Net), which learns real-world partial point cloud object completion without requiring the ground truth of complete 3D objects. Instead, WSSIC-Net leverages 3D ground-truth bounding boxes, partial objects of a raw scene, and unpaired synthetic 3D point clouds. More specifically, a 3D detector is used to encode partial point clouds into proposal features, which are then fed into two branches. The first branch uses fully supervised box prediction based on proposal features. The second branch, hereinafter called instance completion, leverages the proposal features as partial object features to achieve weakly-supervised instance completion. A Generative Adversarial Network (GAN) completes the partial features of the 2.5D foreground objects of real-world scenes using only unpaired but semantically-consistent complete synthetic point clouds. In our experiments, we demonstrate that the fully-supervised 3D detection and the weakly-supervised instance completion complement one another. The qualitative and quantitative evaluations on the ScanNet v2 dataset demonstrate that the proposed “weakly-supervised” approach consistently achieves comparable performance to the state-of-the-art “fully supervised” methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2008-2019"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10944293/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Semantic instance completion aims to recover the complete 3D shapes of foreground objects together with their labels from a partial 2.5D scan of a scene. Previous works have relied on full supervision, which requires ground-truth annotations, in the form of bounding boxes and complete 3D objects. This has greatly limited their real-world application because the acquisition of ground-truth data is very costly and time-consuming. To address this bottleneck, we propose a Weakly-Supervised Semantic Instance Completion Network (WSSIC-Net), which learns real-world partial point cloud object completion without requiring the ground truth of complete 3D objects. Instead, WSSIC-Net leverages 3D ground-truth bounding boxes, partial objects of a raw scene, and unpaired synthetic 3D point clouds. More specifically, a 3D detector is used to encode partial point clouds into proposal features, which are then fed into two branches. The first branch uses fully supervised box prediction based on proposal features. The second branch, hereinafter called instance completion, leverages the proposal features as partial object features to achieve weakly-supervised instance completion. A Generative Adversarial Network (GAN) completes the partial features of the 2.5D foreground objects of real-world scenes using only unpaired but semantically-consistent complete synthetic point clouds. In our experiments, we demonstrate that the fully-supervised 3D detection and the weakly-supervised instance completion complement one another. The qualitative and quantitative evaluations on the ScanNet v2 dataset demonstrate that the proposed “weakly-supervised” approach consistently achieves comparable performance to the state-of-the-art “fully supervised” methods.

查看原文本刊更多论文

WSSIC-Net：三维点云场景的弱监督语义实例完成

语义实例补全旨在从场景的部分2.5D扫描中恢复前景物体的完整3D形状及其标签。以前的工作依赖于完全监督，这需要以边界框和完整3D对象的形式进行ground-truth注释。这极大地限制了它们在现实世界中的应用，因为获取真实数据非常昂贵且耗时。为了解决这一瓶颈，我们提出了一种弱监督语义实例补全网络（WSSIC-Net），它学习真实世界的部分点云对象补全，而不需要完整3D对象的基本事实。相反，WSSIC-Net利用3D地面真实边界框、原始场景的部分对象和未配对的合成3D点云。更具体地说，使用3D检测器将部分点云编码为建议特征，然后将其馈送到两个分支。第一个分支使用基于提案特征的完全监督盒预测。第二个分支，以下称为实例补全，利用提议特征作为部分对象特征来实现弱监督实例补全。生成对抗网络（GAN）仅使用未配对但语义一致的完整合成点云来完成现实场景2.5D前景对象的部分特征。在我们的实验中，我们证明了完全监督的3D检测和弱监督的实例补全是互补的。对ScanNet v2数据集的定性和定量评估表明，所提出的“弱监督”方法始终能够达到与最先进的“完全监督”方法相当的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量