Learning scalable Omni-scale distribution for crowd counting

IF 2.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2025-01-18 DOI:10.1016/j.jvcir.2025.104387

Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian

{"title":"Learning scalable Omni-scale distribution for crowd counting","authors":"Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian","doi":"10.1016/j.jvcir.2025.104387","DOIUrl":null,"url":null,"abstract":"<div><div>Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104387"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S104732032500001X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.

查看原文本刊更多论文

学习可扩展的全尺度分布人群计数

在不受控制的场景中，个体的巨大外观变化对人群计数提出了挑战。许多以前的方法通过学习多尺度特征并将它们连接在一起来阐述这个问题，以获得更令人印象深刻的性能。然而，这种幼稚的融合是直观的，对于大范围的尺度变化来说不够理想。本文提出了一种新的特征融合方案——可扩展全尺度分布融合（SODF），该方案利用多层特征映射中不同尺度分布的优势来近似目标尺度的真实分布。受高斯混合模型的启发，从概率角度超越了多尺度特征融合，我们的SODF模块自适应集成多层特征映射，而不嵌入任何多尺度结构。SODF模块由两个主要部分组成：感知真实分布的交互块和为多层或多列特征映射分配权重的分配块。新提出的SODF模块具有可扩展、轻量级和即插即用的特点，可以灵活地嵌入到其他计数网络中。此外，我们还设计了一个具有SODF模块和多层结构的计数模型（SODF- net）。在四个基准数据集上进行的大量实验表明，所提出的SODF-Net与最先进的计数模型相比表现良好。此外，所提出的SODF模块可以有效地提高MCNN、CSRNet和can等规范计数网络的预测性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.