Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Pub Date : 2023-06-01 DOI:10.1109/CVPR52729.2023.01081

Shuting He, Henghui Ding, Wei Jiang

{"title":"Primitive Generation and Semantic-Related Alignment for Universal Zero-Shot Segmentation","authors":"Shuting He, Henghui Ding, Wei Jiang","doi":"10.1109/CVPR52729.2023.01081","DOIUrl":null,"url":null,"abstract":"We study universal zero-shot segmentation in this work to achieve panoptic, instance, and semantic segmentation for novel categories without any training samples. Such zero-shot segmentation ability relies on inter-class relationships in semantic space to transfer the visual knowledge learned from seen categories to unseen ones. Thus, it is desired to well bridge semantic-visual spaces and apply the semantic relationships to visual feature learning. We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces as well as address the issue of lack of unseen training data. Furthermore, to mitigate the domain gap between semantic and visual spaces, firstly, we enhance the vanilla generator with learned primitives, each of which contains fine-grained attributes related to categories, and synthesize unseen features by selectively assembling these primitives. Secondly, we propose to disentangle the visual feature into the semantic-related part and the semantic-unrelated part that contains useful visual classification clues but is less relevant to semantic representation. The inter-class relationships of semantic-related visual features are then required to be aligned with those in semantic space, thereby transferring semantic knowledge to visual feature learning. The proposed approach achieves impressively state-of-the-art performance on zero-shot panoptic segmentation, instance segmentation, and semantic segmentation.","PeriodicalId":376416,"journal":{"name":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","volume":"54 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CVPR52729.2023.01081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

We study universal zero-shot segmentation in this work to achieve panoptic, instance, and semantic segmentation for novel categories without any training samples. Such zero-shot segmentation ability relies on inter-class relationships in semantic space to transfer the visual knowledge learned from seen categories to unseen ones. Thus, it is desired to well bridge semantic-visual spaces and apply the semantic relationships to visual feature learning. We introduce a generative model to synthesize features for unseen categories, which links semantic and visual spaces as well as address the issue of lack of unseen training data. Furthermore, to mitigate the domain gap between semantic and visual spaces, firstly, we enhance the vanilla generator with learned primitives, each of which contains fine-grained attributes related to categories, and synthesize unseen features by selectively assembling these primitives. Secondly, we propose to disentangle the visual feature into the semantic-related part and the semantic-unrelated part that contains useful visual classification clues but is less relevant to semantic representation. The inter-class relationships of semantic-related visual features are then required to be aligned with those in semantic space, thereby transferring semantic knowledge to visual feature learning. The proposed approach achieves impressively state-of-the-art performance on zero-shot panoptic segmentation, instance segmentation, and semantic segmentation.

查看原文本刊更多论文

通用零距分割的原语生成与语义对齐

在这项工作中，我们研究了通用零采样分割，以实现在没有任何训练样本的情况下对新类别的全视、实例和语义分割。这种零分割能力依赖于语义空间中的类间关系，将从可见类别学习到的视觉知识转移到未见类别。因此，需要很好地连接语义-视觉空间，并将语义关系应用于视觉特征学习。我们引入了一个生成模型来合成未见类别的特征，它将语义和视觉空间联系起来，并解决了缺乏未见训练数据的问题。此外，为了缓解语义空间和视觉空间之间的领域差距，首先，我们使用学习到的原语增强香草生成器，每个原语都包含与类别相关的细粒度属性，并通过选择性地组装这些原语来合成未见过的特征。其次，我们提出将视觉特征分解为语义相关部分和语义不相关部分，这些部分包含有用的视觉分类线索，但与语义表示不太相关。然后要求语义相关的视觉特征的类间关系与语义空间中的类间关系保持一致，从而将语义知识转移到视觉特征学习中。该方法在零镜头全视分割、实例分割和语义分割方面取得了令人印象深刻的最新性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

自引率

0.00%

发文量