面向工业物联网中点导向样例提示的少镜头物体计数新范式

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-06-09 DOI:10.1016/j.future.2025.107946

Gangzheng Zhai , Shaojie Han , Kun Chen , Shihui Zhang

{"title":"面向工业物联网中点导向样例提示的少镜头物体计数新范式","authors":"Gangzheng Zhai , Shaojie Han , Kun Chen , Shihui Zhang","doi":"10.1016/j.future.2025.107946","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of computing infrastructure and the increasing demand for big data processing, object counting has emerged as a critical and challenging task. Few-Shot Object Counting (FSOC) aims to estimate the number of objects in any category based on a few visual exemplar prompts. Existing methods typically rely on bounding boxes to guide the model in understanding the correlation between visual exemplars and the query image, followed by regressing a density map for counting. However, despite the growing overall average performance, we contend that the exploration of more generic counting frameworks has not received adequate attention. In this work, we propose a novel Point-guided Exemplar Prompting Network (PEPNet), a new framework that uses point annotations as prompts to guide object counting. PEPNet consists of two core components: a Multi-scale Attention Fusion Module (MAFM) and an Iterative Encoding Matching Module (IEMM). MAFM integrates spatial and channel attention mechanisms to adaptively highlight critical regions while capturing multi-scale features, effectively balancing global context and local details. IEMM, for the first time, employs a point-guided prompting strategy to iteratively encode visual exemplars, suppressing irrelevant features and enhancing important ones. In particular, the multi-head similarity matching block in IEMM refines the matching process progressively, improving the correlation between exemplars and the query image, thereby boosting object recognition and counting accuracy. Extensive experiments on multiple benchmark datasets, including FSC-147, Val-COCO, Test-COCO, CARPK, and ShanghaiTech, demonstrate the effectiveness of PEPNet. Notably, on the FSC-147 validation set, our method achieves a performance improvement of 1.9% in Mean Absolute Error (MAE) and 12.3% in Root Mean Square Error (RMSE) compared to the state-of-the-art SPDCN. Additionally, on the test set, we observe performance improvements of 0.2% in MAE and 21.5% in RMSE. The source code is available at <span><span>https://github.com/zhaigz/PEPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"174 ","pages":"Article 107946"},"PeriodicalIF":6.2000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards a novel few-shot object counting paradigm with point-guided exemplar prompt in Industrial Internet of Things\",\"authors\":\"Gangzheng Zhai , Shaojie Han , Kun Chen , Shihui Zhang\",\"doi\":\"10.1016/j.future.2025.107946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid development of computing infrastructure and the increasing demand for big data processing, object counting has emerged as a critical and challenging task. Few-Shot Object Counting (FSOC) aims to estimate the number of objects in any category based on a few visual exemplar prompts. Existing methods typically rely on bounding boxes to guide the model in understanding the correlation between visual exemplars and the query image, followed by regressing a density map for counting. However, despite the growing overall average performance, we contend that the exploration of more generic counting frameworks has not received adequate attention. In this work, we propose a novel Point-guided Exemplar Prompting Network (PEPNet), a new framework that uses point annotations as prompts to guide object counting. PEPNet consists of two core components: a Multi-scale Attention Fusion Module (MAFM) and an Iterative Encoding Matching Module (IEMM). MAFM integrates spatial and channel attention mechanisms to adaptively highlight critical regions while capturing multi-scale features, effectively balancing global context and local details. IEMM, for the first time, employs a point-guided prompting strategy to iteratively encode visual exemplars, suppressing irrelevant features and enhancing important ones. In particular, the multi-head similarity matching block in IEMM refines the matching process progressively, improving the correlation between exemplars and the query image, thereby boosting object recognition and counting accuracy. Extensive experiments on multiple benchmark datasets, including FSC-147, Val-COCO, Test-COCO, CARPK, and ShanghaiTech, demonstrate the effectiveness of PEPNet. Notably, on the FSC-147 validation set, our method achieves a performance improvement of 1.9% in Mean Absolute Error (MAE) and 12.3% in Root Mean Square Error (RMSE) compared to the state-of-the-art SPDCN. Additionally, on the test set, we observe performance improvements of 0.2% in MAE and 21.5% in RMSE. The source code is available at <span><span>https://github.com/zhaigz/PEPNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"174 \",\"pages\":\"Article 107946\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25002419\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25002419","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

随着计算基础设施的快速发展和对大数据处理需求的不断增加，物体计数已成为一项关键而具有挑战性的任务。少射对象计数（FSOC）的目的是估计在任何类别的对象数量基于几个视觉范例提示。现有的方法通常依赖于边界框来指导模型理解视觉样本和查询图像之间的相关性，然后回归密度图进行计数。然而，尽管总体平均性能不断提高，但我们认为对更通用计数框架的探索尚未得到足够的重视。在这项工作中，我们提出了一个新的点引导范例提示网络（PEPNet），这是一个使用点注释作为提示来指导对象计数的新框架。PEPNet由两个核心组件组成：多尺度注意力融合模块（MAFM）和迭代编码匹配模块（IEMM）。MAFM集成了空间和通道注意机制，在捕捉多尺度特征的同时自适应突出关键区域，有效地平衡全局背景和局部细节。IEMM首次采用点导向提示策略对视觉样本进行迭代编码，抑制不相关特征，增强重要特征。特别是，IEMM中的多头相似匹配块逐步细化匹配过程，提高了样本与查询图像之间的相关性，从而提高了目标识别和计数精度。在多个基准数据集（包括FSC-147、Val-COCO、Test-COCO、CARPK和ShanghaiTech）上进行的大量实验证明了PEPNet的有效性。值得注意的是，在FSC-147验证集上，与最先进的SPDCN相比，我们的方法在平均绝对误差（MAE）和均方根误差（RMSE）方面的性能提高了1.9%，在均方根误差（RMSE）方面提高了12.3%。此外，在测试集中，我们观察到MAE的性能提高了0.2%，RMSE的性能提高了21.5%。源代码可从https://github.com/zhaigz/PEPNet获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards a novel few-shot object counting paradigm with point-guided exemplar prompt in Industrial Internet of Things

With the rapid development of computing infrastructure and the increasing demand for big data processing, object counting has emerged as a critical and challenging task. Few-Shot Object Counting (FSOC) aims to estimate the number of objects in any category based on a few visual exemplar prompts. Existing methods typically rely on bounding boxes to guide the model in understanding the correlation between visual exemplars and the query image, followed by regressing a density map for counting. However, despite the growing overall average performance, we contend that the exploration of more generic counting frameworks has not received adequate attention. In this work, we propose a novel Point-guided Exemplar Prompting Network (PEPNet), a new framework that uses point annotations as prompts to guide object counting. PEPNet consists of two core components: a Multi-scale Attention Fusion Module (MAFM) and an Iterative Encoding Matching Module (IEMM). MAFM integrates spatial and channel attention mechanisms to adaptively highlight critical regions while capturing multi-scale features, effectively balancing global context and local details. IEMM, for the first time, employs a point-guided prompting strategy to iteratively encode visual exemplars, suppressing irrelevant features and enhancing important ones. In particular, the multi-head similarity matching block in IEMM refines the matching process progressively, improving the correlation between exemplars and the query image, thereby boosting object recognition and counting accuracy. Extensive experiments on multiple benchmark datasets, including FSC-147, Val-COCO, Test-COCO, CARPK, and ShanghaiTech, demonstrate the effectiveness of PEPNet. Notably, on the FSC-147 validation set, our method achieves a performance improvement of 1.9% in Mean Absolute Error (MAE) and 12.3% in Root Mean Square Error (RMSE) compared to the state-of-the-art SPDCN. Additionally, on the test set, we observe performance improvements of 0.2% in MAE and 21.5% in RMSE. The source code is available at https://github.com/zhaigz/PEPNet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.