超越解码器：学习少数镜头对象计数的提示感知功能

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-07-14 DOI:10.1016/j.neucom.2025.130997

Muming Zhao , Guang Li , Piotr Koniusz , Chongyang Zhang , Yongshun Gong

{"title":"超越解码器：学习少数镜头对象计数的提示感知功能","authors":"Muming Zhao , Guang Li , Piotr Koniusz , Chongyang Zhang , Yongshun Gong","doi":"10.1016/j.neucom.2025.130997","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot object counting involves estimating the quantity of objects from an arbitrary category in an image, given a few exemplars as visual prompts. This is typically achieved by matching image and exemplar features to establish a class-agnostic similarity map, which is used to regress a density map for the target class. Prevailing approaches primarily focus on improving the matching phase, designing various intricate decoders to perform sophisticated feature correlation. However, these methods still face challenges when initial features lack discriminative power. In this work, we shift our focus from decoder design to learning discriminative prompt-aware image features, enabling more effective similarity matching and density estimation. Specifically, we first establish a straightforward baseline that leverages a transformer-based backbone to enable direct interactions between images and exemplars. To ensure effective feature learning given limited exemplars, we further introduce a class-relevant prompts guided prediction module, which enhances the backbone’s ability to learn discriminative features by incorporating class-relevant visual cues and auxiliary training objectives. This module is designed to be auxiliary and can be discarded at inference, ensuring no additional computational overhead. Extensive experiments on FSC147 and CARPK demonstrate the effectiveness of our method, highlighting the efficacy of learning prompt-aware feature representations for few-shot counting.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130997"},"PeriodicalIF":6.5000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beyond decoders: Learning prompt-aware features for few-shot object counting\",\"authors\":\"Muming Zhao , Guang Li , Piotr Koniusz , Chongyang Zhang , Yongshun Gong\",\"doi\":\"10.1016/j.neucom.2025.130997\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Few-shot object counting involves estimating the quantity of objects from an arbitrary category in an image, given a few exemplars as visual prompts. This is typically achieved by matching image and exemplar features to establish a class-agnostic similarity map, which is used to regress a density map for the target class. Prevailing approaches primarily focus on improving the matching phase, designing various intricate decoders to perform sophisticated feature correlation. However, these methods still face challenges when initial features lack discriminative power. In this work, we shift our focus from decoder design to learning discriminative prompt-aware image features, enabling more effective similarity matching and density estimation. Specifically, we first establish a straightforward baseline that leverages a transformer-based backbone to enable direct interactions between images and exemplars. To ensure effective feature learning given limited exemplars, we further introduce a class-relevant prompts guided prediction module, which enhances the backbone’s ability to learn discriminative features by incorporating class-relevant visual cues and auxiliary training objectives. This module is designed to be auxiliary and can be discarded at inference, ensuring no additional computational overhead. Extensive experiments on FSC147 and CARPK demonstrate the effectiveness of our method, highlighting the efficacy of learning prompt-aware feature representations for few-shot counting.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"651 \",\"pages\":\"Article 130997\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225016698\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016698","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

少量拍摄对象计数涉及估计图像中任意类别的对象数量，给出一些示例作为视觉提示。这通常是通过匹配图像和范例特征来建立与类别无关的相似性图来实现的，该相似性图用于回归目标类别的密度图。目前流行的方法主要集中在改进匹配相位，设计各种复杂的解码器来执行复杂的特征相关。然而，由于初始特征缺乏判别能力，这些方法仍然面临挑战。在这项工作中，我们将重点从解码器设计转移到学习判别提示感知的图像特征，从而实现更有效的相似性匹配和密度估计。具体来说，我们首先建立一个简单的基线，利用基于转换器的主干来实现图像和范例之间的直接交互。为了确保在有限样本的情况下有效地进行特征学习，我们进一步引入了类相关提示引导的预测模块，该模块通过结合类相关视觉线索和辅助训练目标来增强骨干学习判别特征的能力。该模块被设计为辅助模块，可以在推理时丢弃，确保没有额外的计算开销。在FSC147和CARPK上进行的大量实验证明了我们的方法的有效性，突出了学习少量镜头计数的提示感知特征表示的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Beyond decoders: Learning prompt-aware features for few-shot object counting

Few-shot object counting involves estimating the quantity of objects from an arbitrary category in an image, given a few exemplars as visual prompts. This is typically achieved by matching image and exemplar features to establish a class-agnostic similarity map, which is used to regress a density map for the target class. Prevailing approaches primarily focus on improving the matching phase, designing various intricate decoders to perform sophisticated feature correlation. However, these methods still face challenges when initial features lack discriminative power. In this work, we shift our focus from decoder design to learning discriminative prompt-aware image features, enabling more effective similarity matching and density estimation. Specifically, we first establish a straightforward baseline that leverages a transformer-based backbone to enable direct interactions between images and exemplars. To ensure effective feature learning given limited exemplars, we further introduce a class-relevant prompts guided prediction module, which enhances the backbone’s ability to learn discriminative features by incorporating class-relevant visual cues and auxiliary training objectives. This module is designed to be auxiliary and can be discarded at inference, ensuring no additional computational overhead. Extensive experiments on FSC147 and CARPK demonstrate the effectiveness of our method, highlighting the efficacy of learning prompt-aware feature representations for few-shot counting.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.