{"title":"Scale-Aware Crowd Counting Network With Annotation Error Modeling","authors":"Yi-Kuan Hsieh;Jun-Wei Hsieh;Xin Li;Yu-Ming Zhang;Yu-Chee Tseng;Ming-Ching Chang","doi":"10.1109/TIP.2025.3555116","DOIUrl":null,"url":null,"abstract":"Traditional crowd-counting networks suffer from information loss when feature maps are reduced by pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, using a fixed Gaussian density model does not account for the varying pixel distribution of the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a scale-aware loss function with error-compensation capabilities of noisy annotations. For the first time, we simultaneously model labeling errors (mean) and scale variations (variance) by spatially varying Gaussian distributions to produce fine-grained density maps for crowd counting. Furthermore, the proposed scale-aware Gaussian density model can be dynamically approximated with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. To create a smoother scale-aware feature space, this paper proposes a novel Synthetic Fusion Module (SFM) and an Intra-block Fusion Module (IFM) to generate fine-grained heat maps for better crowd counting. The lightweight version of our model, named SACC-LW, enhances the computational efficiency while retaining accuracy. The superiority and generalization properties of scale-aware loss function are extensively evaluated for different backbone architectures and performance metrics on six public datasets: UCF-QNRF, UCF CC 50, NWPU, ShanghaiTech A, ShanghaiTech B, and JHU. Experimental results also demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd-counting accuracy. The source code is available at <uri>https://github.com/Naughty725</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2750-2764"},"PeriodicalIF":0.0000,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10976492/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional crowd-counting networks suffer from information loss when feature maps are reduced by pooling layers, leading to inaccuracies in counting crowds at a distance. Existing methods often assume correct annotations during training, disregarding the impact of noisy annotations, especially in crowded scenes. Furthermore, using a fixed Gaussian density model does not account for the varying pixel distribution of the camera distance. To overcome these challenges, we propose a Scale-Aware Crowd Counting Network (SACC-Net) that introduces a scale-aware loss function with error-compensation capabilities of noisy annotations. For the first time, we simultaneously model labeling errors (mean) and scale variations (variance) by spatially varying Gaussian distributions to produce fine-grained density maps for crowd counting. Furthermore, the proposed scale-aware Gaussian density model can be dynamically approximated with a low-rank approximation, leading to improved convergence efficiency with comparable accuracy. To create a smoother scale-aware feature space, this paper proposes a novel Synthetic Fusion Module (SFM) and an Intra-block Fusion Module (IFM) to generate fine-grained heat maps for better crowd counting. The lightweight version of our model, named SACC-LW, enhances the computational efficiency while retaining accuracy. The superiority and generalization properties of scale-aware loss function are extensively evaluated for different backbone architectures and performance metrics on six public datasets: UCF-QNRF, UCF CC 50, NWPU, ShanghaiTech A, ShanghaiTech B, and JHU. Experimental results also demonstrate that SACC-Net outperforms all state-of-the-art methods, validating its effectiveness in achieving superior crowd-counting accuracy. The source code is available at https://github.com/Naughty725.