Scale-Aware Crowd Counting Network With Annotation Error Modeling

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-24 DOI:10.1109/TIP.2025.3555116

Yi-Kuan Hsieh;Jun-Wei Hsieh;Xin Li;Yu-Ming Zhang;Yu-Chee Tseng;Ming-Ching Chang

{"title":"Scale-Aware Crowd Counting Network With Annotation Error Modeling","authors":"Yi-Kuan Hsieh;Jun-Wei Hsieh;Xin Li;Yu-Ming Zhang;Yu-Chee Tseng;Ming-Ching Chang","doi":"10.1109/TIP.2025.3555116","DOIUrl":null,"url":null,"abstract":"Traditional crowd-counting networks suffer from information loss when feature maps are reduced by pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, using a fixed Gaussian density model does not account for the varying pixel distribution of the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a scale-aware loss function with error-compensation capabilities of noisy annotations. For the first time, we simultaneously model labeling errors (mean) and scale variations (variance) by spatially varying Gaussian distributions to produce fine-grained density maps for crowd counting. Furthermore, the proposed scale-aware Gaussian density model can be dynamically approximated with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. To create a smoother scale-aware feature space, this paper proposes a novel Synthetic Fusion Module (SFM) and an Intra-block Fusion Module (IFM) to generate fine-grained heat maps for better crowd counting. The lightweight version of our model, named SACC-LW, enhances the computational efficiency while retaining accuracy. The superiority and generalization properties of scale-aware loss function are extensively evaluated for different backbone architectures and performance metrics on six public datasets: UCF-QNRF, UCF CC 50, NWPU, ShanghaiTech A, ShanghaiTech B, and JHU. Experimental results also demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd-counting accuracy. The source code is available at <uri>https://github.com/Naughty725</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2750-2764"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10976492/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional crowd-counting networks suffer from information loss when feature maps are reduced by pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, using a fixed Gaussian density model does not account for the varying pixel distribution of the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a scale-aware loss function with error-compensation capabilities of noisy annotations. For the first time, we simultaneously model labeling errors (mean) and scale variations (variance) by spatially varying Gaussian distributions to produce fine-grained density maps for crowd counting. Furthermore, the proposed scale-aware Gaussian density model can be dynamically approximated with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. To create a smoother scale-aware feature space, this paper proposes a novel Synthetic Fusion Module (SFM) and an Intra-block Fusion Module (IFM) to generate fine-grained heat maps for better crowd counting. The lightweight version of our model, named SACC-LW, enhances the computational efficiency while retaining accuracy. The superiority and generalization properties of scale-aware loss function are extensively evaluated for different backbone architectures and performance metrics on six public datasets: UCF-QNRF, UCF CC 50, NWPU, ShanghaiTech A, ShanghaiTech B, and JHU. Experimental results also demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd-counting accuracy. The source code is available at https://github.com/Naughty725.

查看原文本刊更多论文

基于标注误差建模的规模感知人群计数网络

传统的人群计数网络在通过层池化减少特征图时存在信息丢失的问题，导致远距离人群计数不准确。现有的方法通常在训练过程中假设正确的注释，而忽略了噪声注释的影响，特别是在拥挤的场景中。此外，使用固定的高斯密度模型不能解释相机距离的像素分布变化。为了克服这些挑战，我们提出了一个规模感知人群计数网络（SACC-Net），该网络引入了一个具有噪声注释错误补偿能力的规模感知损失函数。我们首次通过空间变化的高斯分布同时建模标记误差（均值）和尺度变化（方差），以生成用于人群计数的细粒度密度图。此外，所提出的尺度感知高斯密度模型可以用低秩近似动态逼近，从而提高了收敛效率和相当的精度。为了创建更平滑的尺度感知特征空间，本文提出了一种新的合成融合模块（SFM）和块内融合模块（IFM）来生成细粒度热图，以便更好地进行人群计数。我们的模型的轻量级版本，命名为SACC-LW，在保持精度的同时提高了计算效率。在UCF- qnrf、UCF CC 50、NWPU、ShanghaiTech A、ShanghaiTech B和JHU 6个公共数据集上，广泛评估了不同主干架构和性能指标下规模感知损失函数的优势和泛化特性。实验结果还表明，SACC-Net优于所有最先进的方法，验证了其在实现卓越的人群计数准确性方面的有效性。源代码可从https://github.com/Naughty725获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量