{"title":"Hierarchical Multi-Prototype Discrimination: Boosting Support-Query Matching for Few-Shot Segmentation","authors":"Wenbo Xu;Huaxi Huang;Yongshun Gong;Litao Yu;Qiang Wu;Jian Zhang","doi":"10.1109/TMM.2025.3586125","DOIUrl":null,"url":null,"abstract":"Few-shot segmentation (FSS) aims at training a model on base classes with sufficient annotations and then tasking the model with predicting a binary mask to identify novel class pixels with limited labeled images. Mainstream FSS methods adopt a support-query matching paradigm that activates target regions of the query image according to their similarity with a single support class prototype. However, this prototype vector is inclined to overfit the support images, leading to potential under-matching in latent query object regions and incorrect mismatches with base class features in the query image. To address these issues, this study reformulates conventional single foreground prototype matching to a multi-prototype matching paradigm. In this paradigm, query features exhibiting high confidence with non-target prototypes will be categorized as background. Specifically, the target query features are drawn closer to the novel class prototype through a Masked Cross-Image Encoding (MCE) module and a Semantic Multi-prototype Matching (SMM) module is incorporated to collaboratively filter unexpected base class regions on multi-scale features. Furthermore, we devise an adaptive class activation map, termed target-aware class activation map (TCAM) to preserve semantically coherent regions that might be inadvertently suppressed under pixel-wise matching guidance. Experimental results on PASCAL-5<inline-formula><tex-math>$^{i}$</tex-math></inline-formula> and COCO-20<inline-formula><tex-math>$^{i}$</tex-math></inline-formula> datasets demonstrate the advantage of the proposed novel modules, with the holistic approach outperforming compared state-of-the-art methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6705-6718"},"PeriodicalIF":9.7000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11071640/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Few-shot segmentation (FSS) aims at training a model on base classes with sufficient annotations and then tasking the model with predicting a binary mask to identify novel class pixels with limited labeled images. Mainstream FSS methods adopt a support-query matching paradigm that activates target regions of the query image according to their similarity with a single support class prototype. However, this prototype vector is inclined to overfit the support images, leading to potential under-matching in latent query object regions and incorrect mismatches with base class features in the query image. To address these issues, this study reformulates conventional single foreground prototype matching to a multi-prototype matching paradigm. In this paradigm, query features exhibiting high confidence with non-target prototypes will be categorized as background. Specifically, the target query features are drawn closer to the novel class prototype through a Masked Cross-Image Encoding (MCE) module and a Semantic Multi-prototype Matching (SMM) module is incorporated to collaboratively filter unexpected base class regions on multi-scale features. Furthermore, we devise an adaptive class activation map, termed target-aware class activation map (TCAM) to preserve semantically coherent regions that might be inadvertently suppressed under pixel-wise matching guidance. Experimental results on PASCAL-5$^{i}$ and COCO-20$^{i}$ datasets demonstrate the advantage of the proposed novel modules, with the holistic approach outperforming compared state-of-the-art methods.
期刊介绍:
The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.