Yuxuan Luo;Jinpeng Chen;Runmin Cong;Horace Ho Shing Ip;Sam Kwong
{"title":"基于概念级语义传递和上下文级分布的少镜头分割建模","authors":"Yuxuan Luo;Jinpeng Chen;Runmin Cong;Horace Ho Shing Ip;Sam Kwong","doi":"10.1109/TCSVT.2025.3554013","DOIUrl":null,"url":null,"abstract":"Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: 1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; 2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and 3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-<inline-formula> <tex-math>$5^{i}$ </tex-math></inline-formula> and COCO-<inline-formula> <tex-math>$20^{i}$ </tex-math></inline-formula> datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9190-9204"},"PeriodicalIF":11.1000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Concept-Level Semantic Transfer and Context-Level Distribution Modeling for Few-Shot Segmentation\",\"authors\":\"Yuxuan Luo;Jinpeng Chen;Runmin Cong;Horace Ho Shing Ip;Sam Kwong\",\"doi\":\"10.1109/TCSVT.2025.3554013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: 1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; 2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and 3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-<inline-formula> <tex-math>$5^{i}$ </tex-math></inline-formula> and COCO-<inline-formula> <tex-math>$20^{i}$ </tex-math></inline-formula> datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9190-9204\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10937737/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10937737/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Concept-Level Semantic Transfer and Context-Level Distribution Modeling for Few-Shot Segmentation
Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: 1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; 2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and 3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-$5^{i}$ and COCO-$20^{i}$ datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.