Muming Zhao , Guang Li , Piotr Koniusz , Chongyang Zhang , Yongshun Gong
{"title":"超越解码器:学习少数镜头对象计数的提示感知功能","authors":"Muming Zhao , Guang Li , Piotr Koniusz , Chongyang Zhang , Yongshun Gong","doi":"10.1016/j.neucom.2025.130997","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot object counting involves estimating the quantity of objects from an arbitrary category in an image, given a few exemplars as visual prompts. This is typically achieved by matching image and exemplar features to establish a class-agnostic similarity map, which is used to regress a density map for the target class. Prevailing approaches primarily focus on improving the matching phase, designing various intricate decoders to perform sophisticated feature correlation. However, these methods still face challenges when initial features lack discriminative power. In this work, we shift our focus from decoder design to learning discriminative prompt-aware image features, enabling more effective similarity matching and density estimation. Specifically, we first establish a straightforward baseline that leverages a transformer-based backbone to enable direct interactions between images and exemplars. To ensure effective feature learning given limited exemplars, we further introduce a class-relevant prompts guided prediction module, which enhances the backbone’s ability to learn discriminative features by incorporating class-relevant visual cues and auxiliary training objectives. This module is designed to be auxiliary and can be discarded at inference, ensuring no additional computational overhead. Extensive experiments on FSC147 and CARPK demonstrate the effectiveness of our method, highlighting the efficacy of learning prompt-aware feature representations for few-shot counting.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"651 ","pages":"Article 130997"},"PeriodicalIF":6.5000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beyond decoders: Learning prompt-aware features for few-shot object counting\",\"authors\":\"Muming Zhao , Guang Li , Piotr Koniusz , Chongyang Zhang , Yongshun Gong\",\"doi\":\"10.1016/j.neucom.2025.130997\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Few-shot object counting involves estimating the quantity of objects from an arbitrary category in an image, given a few exemplars as visual prompts. This is typically achieved by matching image and exemplar features to establish a class-agnostic similarity map, which is used to regress a density map for the target class. Prevailing approaches primarily focus on improving the matching phase, designing various intricate decoders to perform sophisticated feature correlation. However, these methods still face challenges when initial features lack discriminative power. In this work, we shift our focus from decoder design to learning discriminative prompt-aware image features, enabling more effective similarity matching and density estimation. Specifically, we first establish a straightforward baseline that leverages a transformer-based backbone to enable direct interactions between images and exemplars. To ensure effective feature learning given limited exemplars, we further introduce a class-relevant prompts guided prediction module, which enhances the backbone’s ability to learn discriminative features by incorporating class-relevant visual cues and auxiliary training objectives. This module is designed to be auxiliary and can be discarded at inference, ensuring no additional computational overhead. Extensive experiments on FSC147 and CARPK demonstrate the effectiveness of our method, highlighting the efficacy of learning prompt-aware feature representations for few-shot counting.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"651 \",\"pages\":\"Article 130997\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-07-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225016698\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016698","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Beyond decoders: Learning prompt-aware features for few-shot object counting
Few-shot object counting involves estimating the quantity of objects from an arbitrary category in an image, given a few exemplars as visual prompts. This is typically achieved by matching image and exemplar features to establish a class-agnostic similarity map, which is used to regress a density map for the target class. Prevailing approaches primarily focus on improving the matching phase, designing various intricate decoders to perform sophisticated feature correlation. However, these methods still face challenges when initial features lack discriminative power. In this work, we shift our focus from decoder design to learning discriminative prompt-aware image features, enabling more effective similarity matching and density estimation. Specifically, we first establish a straightforward baseline that leverages a transformer-based backbone to enable direct interactions between images and exemplars. To ensure effective feature learning given limited exemplars, we further introduce a class-relevant prompts guided prediction module, which enhances the backbone’s ability to learn discriminative features by incorporating class-relevant visual cues and auxiliary training objectives. This module is designed to be auxiliary and can be discarded at inference, ensuring no additional computational overhead. Extensive experiments on FSC147 and CARPK demonstrate the effectiveness of our method, highlighting the efficacy of learning prompt-aware feature representations for few-shot counting.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.