Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian
{"title":"Learning scalable Omni-scale distribution for crowd counting","authors":"Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian","doi":"10.1016/j.jvcir.2025.104387","DOIUrl":null,"url":null,"abstract":"<div><div>Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104387"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S104732032500001X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.
期刊介绍:
The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.