Scale Adaptive Enhance Network for Crowd Counting

Zirui Fan, Jun Ruan
{"title":"Scale Adaptive Enhance Network for Crowd Counting","authors":"Zirui Fan, Jun Ruan","doi":"10.1109/ICEIT54416.2022.9690718","DOIUrl":null,"url":null,"abstract":"Crowd counting is a fundamental computer vision task and plays a critical role in video structure analysis and potential down-stream applications, e.g., accident forecasting and urban traffic analysis. The main challenges of crowd counting lie in the scale variation caused by disorderly distributed “person-camera” distances, as well as the interference of complex backgrounds. To address these issues, we propose a scale adaptive enhance network (SAENet) based on the encoder-decoder U-Net architecture. We employ Res2Net as the encoder backbone for extracting multi-scale head information to relieve the scale variation problem. The decoder consists of two branches, i.e., Attention Estimation Network (AENet) to provide attention maps and Density Estimation Network (DENet) to generate density maps. In order to fully leverage the complementary concepts between AENet and DENet, we craft to propose two modules to enhance feature transfer: i) a lightweight plug-and-play interactive attention module (IA-block) is deployed to multiple levels of the decoder to refine the feature map; ii) we propose a global scale adaptive fusion strategy (GSAFS) to adaptively model diverse scale cues to obtain the weighted density map. Extensive experiments show that the proposed method outperforms the existing competitive method and establishes the state-of-the-art results on ShanghaiTech Part A and B, and UCF-QNRF. Our model can achieve 53.56 and 5.95 MAE in ShanghaiTech Part A and B, with obtains performance improvement of 6.0 % and 13.13%, respectively.","PeriodicalId":285571,"journal":{"name":"2022 11th International Conference on Educational and Information Technology (ICEIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference on Educational and Information Technology (ICEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIT54416.2022.9690718","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Crowd counting is a fundamental computer vision task and plays a critical role in video structure analysis and potential down-stream applications, e.g., accident forecasting and urban traffic analysis. The main challenges of crowd counting lie in the scale variation caused by disorderly distributed “person-camera” distances, as well as the interference of complex backgrounds. To address these issues, we propose a scale adaptive enhance network (SAENet) based on the encoder-decoder U-Net architecture. We employ Res2Net as the encoder backbone for extracting multi-scale head information to relieve the scale variation problem. The decoder consists of two branches, i.e., Attention Estimation Network (AENet) to provide attention maps and Density Estimation Network (DENet) to generate density maps. In order to fully leverage the complementary concepts between AENet and DENet, we craft to propose two modules to enhance feature transfer: i) a lightweight plug-and-play interactive attention module (IA-block) is deployed to multiple levels of the decoder to refine the feature map; ii) we propose a global scale adaptive fusion strategy (GSAFS) to adaptively model diverse scale cues to obtain the weighted density map. Extensive experiments show that the proposed method outperforms the existing competitive method and establishes the state-of-the-art results on ShanghaiTech Part A and B, and UCF-QNRF. Our model can achieve 53.56 and 5.95 MAE in ShanghaiTech Part A and B, with obtains performance improvement of 6.0 % and 13.13%, respectively.
人群计数的尺度自适应增强网络
人群计数是一项基本的计算机视觉任务,在视频结构分析和潜在的下游应用中起着至关重要的作用,例如事故预测和城市交通分析。人群计数的主要挑战在于“人-相机”距离无序分布造成的尺度变化,以及复杂背景的干扰。为了解决这些问题,我们提出了一种基于U-Net结构的规模自适应增强网络(SAENet)。我们采用Res2Net作为编码器主干来提取多尺度头部信息,以缓解尺度变化问题。该解码器由两个分支组成,即提供注意图的注意力估计网络(AENet)和生成密度图的密度估计网络(DENet)。为了充分利用AENet和DENet之间的互补概念,我们提出了两个模块来增强特征转移:i)将一个轻量级的即插即用交互注意模块(ia块)部署到解码器的多个级别,以细化特征映射;ii)提出了一种全局尺度自适应融合策略(GSAFS),对不同尺度线索进行自适应建模,获得加权密度图。大量的实验表明,该方法优于现有的竞争方法,并在上海科技A、B部分和UCF-QNRF上建立了最先进的结果。该模型在上海科技A部和B部的MAE分别达到53.56和5.95,性能分别提高了6.0%和13.13%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信