SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-07-17 DOI:10.1109/TIP.2025.3588255

Hefeng Wu;Yandong Chen;Lingbo Liu;Tianshui Chen;Keze Wang;Liang Lin

{"title":"SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting","authors":"Hefeng Wu;Yandong Chen;Lingbo Liu;Tianshui Chen;Keze Wang;Liang Lin","doi":"10.1109/TIP.2025.3588255","DOIUrl":null,"url":null,"abstract":"The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image. To address this challenging task, existing leading methods all resort to density map regression, which renders them impractical for downstream tasks that require object locations and restricts their ability to well explore the scale information of exemplars for supervision. Meanwhile, they generally model the interaction between the input image and the exemplars in an exemplar-by-exemplar way, which is inefficient and may not fully synthesize information from all exemplars. To address these limitations, we propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (SQLNet). It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size. Specifically, during the query stage, rich discriminative representations of the target class are acquired by the Hierarchical Exemplars Collaborative Enhancement (HECE) module from the few exemplars through multi-scale exemplar cooperation with equifrequent size prompt embedding. These representations are then fed into the Exemplars-Unified Query Correlation (EUQC) module to interact with the query features in a unified manner and produce the correlated query tensor. In the localization stage, the Scale-aware Multi-head Localization (SAML) module utilizes the query tensor to predict the confidence, location, and size of each potential object. Moreover, a scale-aware localization loss is introduced, which exploits flexible location associations and exemplar scales for supervision to optimize the model performance. Extensive experiments demonstrate that SQLNet outperforms state-of-the-art methods on popular CAC benchmarks, achieving excellent performance not only in counting accuracy but also in localization and bounding box generation.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4631-4645"},"PeriodicalIF":13.7000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11083681/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The class-agnostic counting (CAC) task has recently been proposed to solve the problem of counting all objects of an arbitrary class with several exemplars given in the input image. To address this challenging task, existing leading methods all resort to density map regression, which renders them impractical for downstream tasks that require object locations and restricts their ability to well explore the scale information of exemplars for supervision. Meanwhile, they generally model the interaction between the input image and the exemplars in an exemplar-by-exemplar way, which is inefficient and may not fully synthesize information from all exemplars. To address these limitations, we propose a novel localization-based CAC approach, termed Scale-modulated Query and Localization Network (SQLNet). It fully explores the scales of exemplars in both the query and localization stages and achieves effective counting by accurately locating each object and predicting its approximate size. Specifically, during the query stage, rich discriminative representations of the target class are acquired by the Hierarchical Exemplars Collaborative Enhancement (HECE) module from the few exemplars through multi-scale exemplar cooperation with equifrequent size prompt embedding. These representations are then fed into the Exemplars-Unified Query Correlation (EUQC) module to interact with the query features in a unified manner and produce the correlated query tensor. In the localization stage, the Scale-aware Multi-head Localization (SAML) module utilizes the query tensor to predict the confidence, location, and size of each potential object. Moreover, a scale-aware localization loss is introduced, which exploits flexible location associations and exemplar scales for supervision to optimize the model performance. Extensive experiments demonstrate that SQLNet outperforms state-of-the-art methods on popular CAC benchmarks, achieving excellent performance not only in counting accuracy but also in localization and bounding box generation.

查看原文本刊更多论文

基于尺度调制查询和定位网络的少射类不可知计数。

类不可知计数（class-agnostic counting， CAC）任务最近被提出，用于解决在输入图像中给定几个样本对任意类的所有对象进行计数的问题。为了解决这一具有挑战性的任务，现有的主要方法都采用密度图回归，这使得它们对于需要对象位置的下游任务不切实际，并且限制了它们很好地探索样本的尺度信息以进行监督的能力。同时，它们通常采用逐例的方式对输入图像与样例之间的交互进行建模，这种方法效率低下，并且可能无法完全综合所有样例的信息。为了解决这些限制，我们提出了一种新的基于本地化的CAC方法，称为规模调制查询和本地化网络（SQLNet）。它在查询和定位两个阶段都充分挖掘了样本的尺度，通过准确定位每个对象并预测其近似大小来实现有效计数。具体而言，在查询阶段，分层样例协同增强（HECE）模块通过多尺度样例合作和等量提示嵌入，从少量样例中获得目标类丰富的判别表示。然后将这些表示输入范例-统一查询关联（EUQC）模块，以统一的方式与查询特征交互并产生相关查询张量。在定位阶段，Scale-aware Multi-head localization （SAML）模块利用查询张量预测每个潜在目标的置信度、位置和大小。此外，引入了尺度感知的定位损失，利用灵活的位置关联和样本尺度进行监督，以优化模型性能。大量的实验表明，SQLNet在流行的CAC基准测试中优于最先进的方法，不仅在计数准确性方面，而且在定位和边界框生成方面都取得了出色的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量