Learning scalable Omni-scale distribution for crowd counting

IF 2.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS
Huake Wang , Xingsong Hou , Kaibing Zhang , Xin Zeng , Minqi Li , Wenke Sun , Xueming Qian
{"title":"Learning scalable Omni-scale distribution for crowd counting","authors":"Huake Wang ,&nbsp;Xingsong Hou ,&nbsp;Kaibing Zhang ,&nbsp;Xin Zeng ,&nbsp;Minqi Li ,&nbsp;Wenke Sun ,&nbsp;Xueming Qian","doi":"10.1016/j.jvcir.2025.104387","DOIUrl":null,"url":null,"abstract":"<div><div>Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"107 ","pages":"Article 104387"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S104732032500001X","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Crowd counting is challenged by large appearance variations of individuals in uncontrolled scenes. Many previous approaches elaborated on this problem by learning multi-scale features and concatenating them together for more impressive performance. However, such a naive fusion is intuitional and not optimal enough for a wide range of scale variations. In this paper, we propose a novel feature fusion scheme, called Scalable Omni-scale Distribution Fusion (SODF), which leverages the benefits of different scale distributions from multi-layer feature maps to approximate the real distribution of target scale. Inspired by Gaussian Mixture Model that surmounts multi-scale feature fusion from a probabilistic perspective, our SODF module adaptively integrate multi-layer feature maps without embedding any multi-scale structures. The SODF module is comprised of two major components: an interaction block that perceives the real distribution and an assignment block which assigns the weights to the multi-layer or multi-column feature maps. The newly proposed SODF module is scalable, light-weight, and plug-and-play, and can be flexibly embedded into other counting networks. In addition, we design a counting model (SODF-Net) with SODF module and multi-layer structure. Extensive experiments on four benchmark datasets manifest that the proposed SODF-Net performs favorably against the state-of-the-art counting models. Furthermore, the proposed SODF module can efficiently improve the prediction performance of canonical counting networks, e.g., MCNN, CSRNet, and CAN.
学习可扩展的全尺度分布人群计数
在不受控制的场景中,个体的巨大外观变化对人群计数提出了挑战。许多以前的方法通过学习多尺度特征并将它们连接在一起来阐述这个问题,以获得更令人印象深刻的性能。然而,这种幼稚的融合是直观的,对于大范围的尺度变化来说不够理想。本文提出了一种新的特征融合方案——可扩展全尺度分布融合(SODF),该方案利用多层特征映射中不同尺度分布的优势来近似目标尺度的真实分布。受高斯混合模型的启发,从概率角度超越了多尺度特征融合,我们的SODF模块自适应集成多层特征映射,而不嵌入任何多尺度结构。SODF模块由两个主要部分组成:感知真实分布的交互块和为多层或多列特征映射分配权重的分配块。新提出的SODF模块具有可扩展、轻量级和即插即用的特点,可以灵活地嵌入到其他计数网络中。此外,我们还设计了一个具有SODF模块和多层结构的计数模型(SODF- net)。在四个基准数据集上进行的大量实验表明,所提出的SODF-Net与最先进的计数模型相比表现良好。此外,所提出的SODF模块可以有效地提高MCNN、CSRNet和can等规范计数网络的预测性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Visual Communication and Image Representation
Journal of Visual Communication and Image Representation 工程技术-计算机:软件工程
CiteScore
5.40
自引率
11.50%
发文量
188
审稿时长
9.9 months
期刊介绍: The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信