Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud Semantic Segmentation

He, Shuting, Jiang, Xudong, Jiang, Wei, Ding, Henghui
{"title":"Prototype Adaption and Projection for Few- and Zero-shot 3D Point Cloud\n Semantic Segmentation","authors":"He, Shuting, Jiang, Xudong, Jiang, Wei, Ding, Henghui","doi":"10.48550/arxiv.2305.14335","DOIUrl":null,"url":null,"abstract":"In this work, we address the challenging task of few-shot and zero-shot 3D point cloud semantic segmentation. The success of few-shot semantic segmentation in 2D computer vision is mainly driven by the pre-training on large-scale datasets like imagenet. The feature extractor pre-trained on large-scale 2D datasets greatly helps the 2D few-shot learning. However, the development of 3D deep learning is hindered by the limited volume and instance modality of datasets due to the significant cost of 3D data collection and annotation. This results in less representative features and large intra-class feature variation for few-shot 3D point cloud segmentation. As a consequence, directly extending existing popular prototypical methods of 2D few-shot classification/segmentation into 3D point cloud segmentation won't work as well as in 2D domain. To address this issue, we propose a Query-Guided Prototype Adaption (QGPA) module to adapt the prototype from support point clouds feature space to query point clouds feature space. With such prototype adaption, we greatly alleviate the issue of large feature intra-class variation in point cloud and significantly improve the performance of few-shot 3D segmentation. Besides, to enhance the representation of prototypes, we introduce a Self-Reconstruction (SR) module that enables prototype to reconstruct the support mask as well as possible. Moreover, we further consider zero-shot 3D point cloud semantic segmentation where there is no support sample. To this end, we introduce category words as semantic information and propose a semantic-visual projection model to bridge the semantic and visual spaces. Our proposed method surpasses state-of-the-art algorithms by a considerable 7.90% and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks, respectively. Code is available at https://github.com/heshuting555/PAP-FZS3D.","PeriodicalId":496270,"journal":{"name":"arXiv (Cornell University)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv (Cornell University)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arxiv.2305.14335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this work, we address the challenging task of few-shot and zero-shot 3D point cloud semantic segmentation. The success of few-shot semantic segmentation in 2D computer vision is mainly driven by the pre-training on large-scale datasets like imagenet. The feature extractor pre-trained on large-scale 2D datasets greatly helps the 2D few-shot learning. However, the development of 3D deep learning is hindered by the limited volume and instance modality of datasets due to the significant cost of 3D data collection and annotation. This results in less representative features and large intra-class feature variation for few-shot 3D point cloud segmentation. As a consequence, directly extending existing popular prototypical methods of 2D few-shot classification/segmentation into 3D point cloud segmentation won't work as well as in 2D domain. To address this issue, we propose a Query-Guided Prototype Adaption (QGPA) module to adapt the prototype from support point clouds feature space to query point clouds feature space. With such prototype adaption, we greatly alleviate the issue of large feature intra-class variation in point cloud and significantly improve the performance of few-shot 3D segmentation. Besides, to enhance the representation of prototypes, we introduce a Self-Reconstruction (SR) module that enables prototype to reconstruct the support mask as well as possible. Moreover, we further consider zero-shot 3D point cloud semantic segmentation where there is no support sample. To this end, we introduce category words as semantic information and propose a semantic-visual projection model to bridge the semantic and visual spaces. Our proposed method surpasses state-of-the-art algorithms by a considerable 7.90% and 14.82% under the 2-way 1-shot setting on S3DIS and ScanNet benchmarks, respectively. Code is available at https://github.com/heshuting555/PAP-FZS3D.
基于原型自适应和投影的少射和零射三维点云语义分割
在这项工作中,我们解决了少镜头和零镜头3D点云语义分割的挑战性任务。在二维计算机视觉中,少镜头语义分割的成功主要得益于对imagenet等大规模数据集的预训练。在大规模二维数据集上进行预训练的特征提取器对二维少镜头学习有很大帮助。然而,由于3D数据收集和注释的巨大成本,数据集的数量和实例模式有限,阻碍了3D深度学习的发展。这导致了少镜头三维点云分割的代表性特征较少,类内特征变化较大。因此,将现有流行的2D少镜头分类/分割的原型方法直接扩展到3D点云分割中,在2D领域的效果并不好。为了解决这一问题,我们提出了一个查询导向的原型自适应(QGPA)模块,将支持点云特征空间的原型自适应到查询点云特征空间。通过这种原型自适应,极大地缓解了点云中特征类内变化大的问题,显著提高了少镜头三维分割的性能。此外,为了增强原型的表征性,我们引入了一个自重构(Self-Reconstruction, SR)模块,使原型能够尽可能地重构支撑掩模。此外,我们进一步考虑了在没有支持样本的情况下的零镜头三维点云语义分割。为此,我们引入范畴词作为语义信息,并提出一种语义-视觉投影模型来架起语义和视觉空间之间的桥梁。我们提出的方法在S3DIS和ScanNet基准测试的2路1枪设置下,分别比最先进的算法高出7.90%和14.82%。代码可从https://github.com/heshuting555/PAP-FZS3D获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信