Concept-Level Semantic Transfer and Context-Level Distribution Modeling for Few-Shot Segmentation

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-03-24 DOI:10.1109/TCSVT.2025.3554013

Yuxuan Luo;Jinpeng Chen;Runmin Cong;Horace Ho Shing Ip;Sam Kwong

{"title":"Concept-Level Semantic Transfer and Context-Level Distribution Modeling for Few-Shot Segmentation","authors":"Yuxuan Luo;Jinpeng Chen;Runmin Cong;Horace Ho Shing Ip;Sam Kwong","doi":"10.1109/TCSVT.2025.3554013","DOIUrl":null,"url":null,"abstract":"Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: 1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; 2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and 3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-<inline-formula> <tex-math>$5^{i}$ </tex-math></inline-formula> and COCO-<inline-formula> <tex-math>$20^{i}$ </tex-math></inline-formula> datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9190-9204"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937737/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: 1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; 2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and 3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-

$5^{i}$

and COCO-

$20^{i}$

datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.

查看原文本刊更多论文

基于概念级语义传递和上下文级分布的少镜头分割建模

少镜头分割（FSS）方法的目的是只使用几个像素级的带注释的样本来分割对象。目前的方法要么从支持样本中得到一个广义的类表示来指导查询样本的分割，这往往会抛弃关键的空间上下文信息，要么严重依赖支持样本和查询样本之间的空间亲和力，而没有充分总结和利用目标类的核心信息。因此，前者在细节精度上存在问题，而后者往往在整体定位上产生误差。为了解决这些问题，我们提出了一种新的FSS框架CCFormer，该框架平衡了核心语义概念的传递与空间上下文的建模，从而提高了宏观和微观层面的分割精度。我们的方法引入了三个关键模块：1)概念感知生成（CPG）模块，它利用预训练的类别感知能力来捕获目标类的高质量核心表示；2)概念特征集成（CFI）模块，在特征提取过程中将核心类信息注入到支持特征和查询特征中；3)上下文分布挖掘（CDM）模块，利用布朗距离协方差矩阵对支持样本和查询样本之间的空间通道分布进行建模，保持目标的细粒度完整性。在PASCAL- $5^{i}$和COCO- $20^{i}$数据集上的实验结果表明，CCFormer达到了最先进的性能，可视化进一步验证了其有效性。我们的代码可在github.com/lourise/ccformer上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.