Incremental few-shot instance segmentation via feature enhancement and prototype calibration

IF 3.5 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2025-02-12 DOI:10.1016/j.cviu.2025.104317

Weixiang Gao , Caijuan Shi , Rui Wang , Ao Cai , Changyu Duan , Meiqin Liu

{"title":"Incremental few-shot instance segmentation via feature enhancement and prototype calibration","authors":"Weixiang Gao , Caijuan Shi , Rui Wang , Ao Cai , Changyu Duan , Meiqin Liu","doi":"10.1016/j.cviu.2025.104317","DOIUrl":null,"url":null,"abstract":"<div><div>Incremental few-shot instance segmentation (iFSIS) aims to detect and segment instances of novel classes with only a few training samples, while maintaining performance on base classes without revisiting base class data. iMTFA, a representative iFSIS method, offers a flexible approach for adding novel classes. Its key mechanism involves generating novel class weights by normalizing and averaging embeddings obtained from <span><math><mi>K</mi></math></span>-shot novel instances. However, relying on such a small sample size often leads to insufficient representation of the real class distribution, which in turn results in biased weights for the novel classes. Furthermore, due to the absence of novel fine-tuning, iMTFA tends to predict potential novel class foregrounds as background, which exacerbates the bias in the generated novel class weights. To overcome these limitations, we propose a simple but effective iFSIS method, named Enhancement and Calibration-based iMTFA (EC-iMTFA). Specifically, we first design an embedding enhancement and aggregation (EEA) module, which enhances the feature diversity of each novel instance embedding before generating novel class weights. We then design a novel prototype calibration (NPC) module that leverages the well-calibrated base class and background weights in the classifier to enhance the discriminability of novel class prototypes. In addition, a simple weight preprocessing (WP) mechanism is designed based on NPC to improve the calibration process further. Extensive experiments on COCO and VOC datasets demonstrate that EC-iMTFA outperforms iMTFA in terms of iFSIS and iFSOD performance, stability, and efficiency without requiring novel fine-tuning. Moreover, EC-iMTFA achieves competitive results compared to recent state-of-the-art methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"253 ","pages":"Article 104317"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314225000402","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Incremental few-shot instance segmentation (iFSIS) aims to detect and segment instances of novel classes with only a few training samples, while maintaining performance on base classes without revisiting base class data. iMTFA, a representative iFSIS method, offers a flexible approach for adding novel classes. Its key mechanism involves generating novel class weights by normalizing and averaging embeddings obtained from

K

-shot novel instances. However, relying on such a small sample size often leads to insufficient representation of the real class distribution, which in turn results in biased weights for the novel classes. Furthermore, due to the absence of novel fine-tuning, iMTFA tends to predict potential novel class foregrounds as background, which exacerbates the bias in the generated novel class weights. To overcome these limitations, we propose a simple but effective iFSIS method, named Enhancement and Calibration-based iMTFA (EC-iMTFA). Specifically, we first design an embedding enhancement and aggregation (EEA) module, which enhances the feature diversity of each novel instance embedding before generating novel class weights. We then design a novel prototype calibration (NPC) module that leverages the well-calibrated base class and background weights in the classifier to enhance the discriminability of novel class prototypes. In addition, a simple weight preprocessing (WP) mechanism is designed based on NPC to improve the calibration process further. Extensive experiments on COCO and VOC datasets demonstrate that EC-iMTFA outperforms iMTFA in terms of iFSIS and iFSOD performance, stability, and efficiency without requiring novel fine-tuning. Moreover, EC-iMTFA achieves competitive results compared to recent state-of-the-art methods.

查看原文本刊更多论文

基于特征增强和原型校准的增量少镜头实例分割

增量少镜头实例分割（iFSIS）旨在仅使用少量训练样本检测和分割新类的实例，同时在不重新访问基类数据的情况下保持基类的性能。iMTFA是iFSIS的一种代表性方法，它为添加新类提供了一种灵活的方法。其关键机制包括通过对K-shot新实例获得的嵌入进行归一化和平均来生成新的类权值。然而，依赖如此小的样本量往往会导致真实类别分布的代表性不足，从而导致新类别的权重有偏。此外，由于缺乏新颖的微调，iMTFA倾向于预测潜在的新类别前景作为背景，这加剧了生成的新类别权重的偏差。为了克服这些限制，我们提出了一种简单而有效的iFSIS方法，称为基于增强和校准的iMTFA （EC-iMTFA）。具体而言，我们首先设计了一个嵌入增强和聚合（EEA）模块，该模块在生成新的类权值之前增强了每个新实例嵌入的特征多样性。然后，我们设计了一个新的原型校准（NPC）模块，该模块利用分类器中校准良好的基类和背景权重来增强新类原型的可辨别性。此外，基于NPC设计了一种简单的权重预处理（WP）机制，进一步改善了校准过程。在COCO和VOC数据集上进行的大量实验表明，EC-iMTFA在iFSIS和iFSOD性能、稳定性和效率方面优于iMTFA，而无需进行新的微调。此外，与最近最先进的方法相比，EC-iMTFA取得了具有竞争力的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems