Instance-level feature representation calibration for visual object detection

IF 3.4 2区工程技术 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Displays Pub Date : 2025-07-03 DOI:10.1016/j.displa.2025.103130

Hua Zhang , Jingzhi Li , Wenqi Ren , Chaopeng Li , Xiaochun Cao

{"title":"Instance-level feature representation calibration for visual object detection","authors":"Hua Zhang , Jingzhi Li , Wenqi Ren , Chaopeng Li , Xiaochun Cao","doi":"10.1016/j.displa.2025.103130","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot object detection has gained significant attention due to the scarcity of training samples in real-world applications. Most existing methods attempt to transfer knowledge learned from abundant base classes to novel class detection, typically following a two-stage process: base training and fine-tuning. While these detectors excel at object localization, they often struggle with classification due to biased feature representations of instances. In this paper, we propose a novel framework for feature representation learning in few-shot object detection, aimed at refining the instance representation using a prototype-based supervised contrastive learning approach. Specifically, we design a prototype representation bank that serves as a template for supervised contrastive learning and introduce a positive example learning strategy to obtain generalized and discriminative object features. Additionally, we introduce a balanced cross-entropy loss that dynamically adjusts the weightings of gradients from positive and negative samples, thereby enhancing the confidence in object recognition. Extensive experiments on the Pascal VOC and MS-COCO benchmarks show that our method achieves state-of-the-art performance, with significant improvements across most splits and shot settings.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"90 ","pages":"Article 103130"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225001672","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Few-shot object detection has gained significant attention due to the scarcity of training samples in real-world applications. Most existing methods attempt to transfer knowledge learned from abundant base classes to novel class detection, typically following a two-stage process: base training and fine-tuning. While these detectors excel at object localization, they often struggle with classification due to biased feature representations of instances. In this paper, we propose a novel framework for feature representation learning in few-shot object detection, aimed at refining the instance representation using a prototype-based supervised contrastive learning approach. Specifically, we design a prototype representation bank that serves as a template for supervised contrastive learning and introduce a positive example learning strategy to obtain generalized and discriminative object features. Additionally, we introduce a balanced cross-entropy loss that dynamically adjusts the weightings of gradients from positive and negative samples, thereby enhancing the confidence in object recognition. Extensive experiments on the Pascal VOC and MS-COCO benchmarks show that our method achieves state-of-the-art performance, with significant improvements across most splits and shot settings.

查看原文本刊更多论文

用于视觉目标检测的实例级特征表示校准

由于实际应用中训练样本的稀缺性，少镜头目标检测受到了极大的关注。大多数现有的方法都试图将从丰富的基类中学习到的知识转移到新的类检测中，通常遵循两个阶段的过程：基础训练和微调。虽然这些检测器在对象定位方面表现出色，但由于实例的特征表示存在偏差，它们经常在分类方面遇到困难。在本文中，我们提出了一个新的框架，用于特征表示学习在少镜头目标检测，旨在改进实例表示使用基于原型的监督对比学习方法。具体来说，我们设计了一个原型表示库作为监督对比学习的模板，并引入了一个正例学习策略来获得广义和判别对象特征。此外，我们引入了平衡的交叉熵损失，动态调整正、负样本梯度的权重，从而增强了目标识别的置信度。在Pascal VOC和MS-COCO基准测试上进行的大量实验表明，我们的方法达到了最先进的性能，在大多数分割和拍摄设置上都有显著的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Displays 工程技术-工程：电子与电气

CiteScore

4.60

自引率

25.60%

发文量

138

审稿时长

92 days

期刊介绍： Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface. Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.