分层多原型判别:促进支持查询匹配的少镜头分割

IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Wenbo Xu;Huaxi Huang;Yongshun Gong;Litao Yu;Qiang Wu;Jian Zhang
{"title":"分层多原型判别:促进支持查询匹配的少镜头分割","authors":"Wenbo Xu;Huaxi Huang;Yongshun Gong;Litao Yu;Qiang Wu;Jian Zhang","doi":"10.1109/TMM.2025.3586125","DOIUrl":null,"url":null,"abstract":"Few-shot segmentation (FSS) aims at training a model on base classes with sufficient annotations and then tasking the model with predicting a binary mask to identify novel class pixels with limited labeled images. Mainstream FSS methods adopt a support-query matching paradigm that activates target regions of the query image according to their similarity with a single support class prototype. However, this prototype vector is inclined to overfit the support images, leading to potential under-matching in latent query object regions and incorrect mismatches with base class features in the query image. To address these issues, this study reformulates conventional single foreground prototype matching to a multi-prototype matching paradigm. In this paradigm, query features exhibiting high confidence with non-target prototypes will be categorized as background. Specifically, the target query features are drawn closer to the novel class prototype through a Masked Cross-Image Encoding (MCE) module and a Semantic Multi-prototype Matching (SMM) module is incorporated to collaboratively filter unexpected base class regions on multi-scale features. Furthermore, we devise an adaptive class activation map, termed target-aware class activation map (TCAM) to preserve semantically coherent regions that might be inadvertently suppressed under pixel-wise matching guidance. Experimental results on PASCAL-5<inline-formula><tex-math>$^{i}$</tex-math></inline-formula> and COCO-20<inline-formula><tex-math>$^{i}$</tex-math></inline-formula> datasets demonstrate the advantage of the proposed novel modules, with the holistic approach outperforming compared state-of-the-art methods.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"6705-6718"},"PeriodicalIF":9.7000,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hierarchical Multi-Prototype Discrimination: Boosting Support-Query Matching for Few-Shot Segmentation\",\"authors\":\"Wenbo Xu;Huaxi Huang;Yongshun Gong;Litao Yu;Qiang Wu;Jian Zhang\",\"doi\":\"10.1109/TMM.2025.3586125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Few-shot segmentation (FSS) aims at training a model on base classes with sufficient annotations and then tasking the model with predicting a binary mask to identify novel class pixels with limited labeled images. Mainstream FSS methods adopt a support-query matching paradigm that activates target regions of the query image according to their similarity with a single support class prototype. However, this prototype vector is inclined to overfit the support images, leading to potential under-matching in latent query object regions and incorrect mismatches with base class features in the query image. To address these issues, this study reformulates conventional single foreground prototype matching to a multi-prototype matching paradigm. In this paradigm, query features exhibiting high confidence with non-target prototypes will be categorized as background. Specifically, the target query features are drawn closer to the novel class prototype through a Masked Cross-Image Encoding (MCE) module and a Semantic Multi-prototype Matching (SMM) module is incorporated to collaboratively filter unexpected base class regions on multi-scale features. Furthermore, we devise an adaptive class activation map, termed target-aware class activation map (TCAM) to preserve semantically coherent regions that might be inadvertently suppressed under pixel-wise matching guidance. Experimental results on PASCAL-5<inline-formula><tex-math>$^{i}$</tex-math></inline-formula> and COCO-20<inline-formula><tex-math>$^{i}$</tex-math></inline-formula> datasets demonstrate the advantage of the proposed novel modules, with the holistic approach outperforming compared state-of-the-art methods.\",\"PeriodicalId\":13273,\"journal\":{\"name\":\"IEEE Transactions on Multimedia\",\"volume\":\"27 \",\"pages\":\"6705-6718\"},\"PeriodicalIF\":9.7000,\"publicationDate\":\"2025-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Multimedia\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11071640/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Multimedia","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11071640/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

少镜头分割(Few-shot segmentation, FSS)的目的是在具有足够注释的基类上训练模型,然后将预测二值掩码的任务分配给模型,以在有限的标记图像中识别新的类像素。主流的FSS方法采用支持-查询匹配范式,根据查询图像的目标区域与单个支持类原型的相似性激活查询图像的目标区域。然而,该原型向量倾向于对支持图像进行过拟合,从而导致潜在查询对象区域的潜在不匹配以及与查询图像中的基类特征不匹配。为了解决这些问题,本研究将传统的单一前景原型匹配重新定义为多原型匹配范式。在这个范例中,对非目标原型表现出高置信度的查询特征将被归类为背景。具体而言,通过掩码交叉图像编码(mask Cross-Image Encoding, MCE)模块使目标查询特征更接近新类原型,并结合语义多原型匹配(Semantic Multi-prototype Matching, SMM)模块在多尺度特征上协同过滤意外基类区域。此外,我们设计了一个自适应类激活图,称为目标感知类激活图(TCAM),以保留语义上连贯的区域,这些区域可能在像素匹配指导下无意中被抑制。在PASCAL-5$^{i}$和COCO-20$^{i}$数据集上的实验结果证明了所提出的新模块的优势,整体方法优于比较先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Hierarchical Multi-Prototype Discrimination: Boosting Support-Query Matching for Few-Shot Segmentation
Few-shot segmentation (FSS) aims at training a model on base classes with sufficient annotations and then tasking the model with predicting a binary mask to identify novel class pixels with limited labeled images. Mainstream FSS methods adopt a support-query matching paradigm that activates target regions of the query image according to their similarity with a single support class prototype. However, this prototype vector is inclined to overfit the support images, leading to potential under-matching in latent query object regions and incorrect mismatches with base class features in the query image. To address these issues, this study reformulates conventional single foreground prototype matching to a multi-prototype matching paradigm. In this paradigm, query features exhibiting high confidence with non-target prototypes will be categorized as background. Specifically, the target query features are drawn closer to the novel class prototype through a Masked Cross-Image Encoding (MCE) module and a Semantic Multi-prototype Matching (SMM) module is incorporated to collaboratively filter unexpected base class regions on multi-scale features. Furthermore, we devise an adaptive class activation map, termed target-aware class activation map (TCAM) to preserve semantically coherent regions that might be inadvertently suppressed under pixel-wise matching guidance. Experimental results on PASCAL-5$^{i}$ and COCO-20$^{i}$ datasets demonstrate the advantage of the proposed novel modules, with the holistic approach outperforming compared state-of-the-art methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Multimedia
IEEE Transactions on Multimedia 工程技术-电信学
CiteScore
11.70
自引率
11.00%
发文量
576
审稿时长
5.5 months
期刊介绍: The IEEE Transactions on Multimedia delves into diverse aspects of multimedia technology and applications, covering circuits, networking, signal processing, systems, software, and systems integration. The scope aligns with the Fields of Interest of the sponsors, ensuring a comprehensive exploration of research in multimedia.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信