Gangzheng Zhai , Shaojie Han , Kun Chen , Shihui Zhang
{"title":"面向工业物联网中点导向样例提示的少镜头物体计数新范式","authors":"Gangzheng Zhai , Shaojie Han , Kun Chen , Shihui Zhang","doi":"10.1016/j.future.2025.107946","DOIUrl":null,"url":null,"abstract":"<div><div>With the rapid development of computing infrastructure and the increasing demand for big data processing, object counting has emerged as a critical and challenging task. Few-Shot Object Counting (FSOC) aims to estimate the number of objects in any category based on a few visual exemplar prompts. Existing methods typically rely on bounding boxes to guide the model in understanding the correlation between visual exemplars and the query image, followed by regressing a density map for counting. However, despite the growing overall average performance, we contend that the exploration of more generic counting frameworks has not received adequate attention. In this work, we propose a novel Point-guided Exemplar Prompting Network (PEPNet), a new framework that uses point annotations as prompts to guide object counting. PEPNet consists of two core components: a Multi-scale Attention Fusion Module (MAFM) and an Iterative Encoding Matching Module (IEMM). MAFM integrates spatial and channel attention mechanisms to adaptively highlight critical regions while capturing multi-scale features, effectively balancing global context and local details. IEMM, for the first time, employs a point-guided prompting strategy to iteratively encode visual exemplars, suppressing irrelevant features and enhancing important ones. In particular, the multi-head similarity matching block in IEMM refines the matching process progressively, improving the correlation between exemplars and the query image, thereby boosting object recognition and counting accuracy. Extensive experiments on multiple benchmark datasets, including FSC-147, Val-COCO, Test-COCO, CARPK, and ShanghaiTech, demonstrate the effectiveness of PEPNet. Notably, on the FSC-147 validation set, our method achieves a performance improvement of 1.9% in Mean Absolute Error (MAE) and 12.3% in Root Mean Square Error (RMSE) compared to the state-of-the-art SPDCN. Additionally, on the test set, we observe performance improvements of 0.2% in MAE and 21.5% in RMSE. The source code is available at <span><span>https://github.com/zhaigz/PEPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"174 ","pages":"Article 107946"},"PeriodicalIF":6.2000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards a novel few-shot object counting paradigm with point-guided exemplar prompt in Industrial Internet of Things\",\"authors\":\"Gangzheng Zhai , Shaojie Han , Kun Chen , Shihui Zhang\",\"doi\":\"10.1016/j.future.2025.107946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>With the rapid development of computing infrastructure and the increasing demand for big data processing, object counting has emerged as a critical and challenging task. Few-Shot Object Counting (FSOC) aims to estimate the number of objects in any category based on a few visual exemplar prompts. Existing methods typically rely on bounding boxes to guide the model in understanding the correlation between visual exemplars and the query image, followed by regressing a density map for counting. However, despite the growing overall average performance, we contend that the exploration of more generic counting frameworks has not received adequate attention. In this work, we propose a novel Point-guided Exemplar Prompting Network (PEPNet), a new framework that uses point annotations as prompts to guide object counting. PEPNet consists of two core components: a Multi-scale Attention Fusion Module (MAFM) and an Iterative Encoding Matching Module (IEMM). MAFM integrates spatial and channel attention mechanisms to adaptively highlight critical regions while capturing multi-scale features, effectively balancing global context and local details. IEMM, for the first time, employs a point-guided prompting strategy to iteratively encode visual exemplars, suppressing irrelevant features and enhancing important ones. In particular, the multi-head similarity matching block in IEMM refines the matching process progressively, improving the correlation between exemplars and the query image, thereby boosting object recognition and counting accuracy. Extensive experiments on multiple benchmark datasets, including FSC-147, Val-COCO, Test-COCO, CARPK, and ShanghaiTech, demonstrate the effectiveness of PEPNet. Notably, on the FSC-147 validation set, our method achieves a performance improvement of 1.9% in Mean Absolute Error (MAE) and 12.3% in Root Mean Square Error (RMSE) compared to the state-of-the-art SPDCN. Additionally, on the test set, we observe performance improvements of 0.2% in MAE and 21.5% in RMSE. The source code is available at <span><span>https://github.com/zhaigz/PEPNet</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"174 \",\"pages\":\"Article 107946\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25002419\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25002419","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Towards a novel few-shot object counting paradigm with point-guided exemplar prompt in Industrial Internet of Things
With the rapid development of computing infrastructure and the increasing demand for big data processing, object counting has emerged as a critical and challenging task. Few-Shot Object Counting (FSOC) aims to estimate the number of objects in any category based on a few visual exemplar prompts. Existing methods typically rely on bounding boxes to guide the model in understanding the correlation between visual exemplars and the query image, followed by regressing a density map for counting. However, despite the growing overall average performance, we contend that the exploration of more generic counting frameworks has not received adequate attention. In this work, we propose a novel Point-guided Exemplar Prompting Network (PEPNet), a new framework that uses point annotations as prompts to guide object counting. PEPNet consists of two core components: a Multi-scale Attention Fusion Module (MAFM) and an Iterative Encoding Matching Module (IEMM). MAFM integrates spatial and channel attention mechanisms to adaptively highlight critical regions while capturing multi-scale features, effectively balancing global context and local details. IEMM, for the first time, employs a point-guided prompting strategy to iteratively encode visual exemplars, suppressing irrelevant features and enhancing important ones. In particular, the multi-head similarity matching block in IEMM refines the matching process progressively, improving the correlation between exemplars and the query image, thereby boosting object recognition and counting accuracy. Extensive experiments on multiple benchmark datasets, including FSC-147, Val-COCO, Test-COCO, CARPK, and ShanghaiTech, demonstrate the effectiveness of PEPNet. Notably, on the FSC-147 validation set, our method achieves a performance improvement of 1.9% in Mean Absolute Error (MAE) and 12.3% in Root Mean Square Error (RMSE) compared to the state-of-the-art SPDCN. Additionally, on the test set, we observe performance improvements of 0.2% in MAE and 21.5% in RMSE. The source code is available at https://github.com/zhaigz/PEPNet.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.