Multi-Dilation Network for Crowd Counting

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI:10.1145/3338533.3366687

Shuheng Wang, Hanli Wang, Qinyu Li

{"title":"Multi-Dilation Network for Crowd Counting","authors":"Shuheng Wang, Hanli Wang, Qinyu Li","doi":"10.1145/3338533.3366687","DOIUrl":null,"url":null,"abstract":"With the growth of urban population, crowd analysis has become an important and necessary task in the field of computer vision. The goal of crowd counting, which is a subfield of crowd analysis, is to count the number of people in an image or a zone of a picture. Due to the problems like heavy occlusions, perspective and luminous intensity variations, it is still extremely challenging to achieve crowd counting. Recent state-of-the-art approaches are mainly designed with convolutional neural networks to generate density maps. In this work, Multi-Dilation Network (MDNet) is proposed to solve the problem of crowd counting in congested scenes. The MDNet is made up of two parts: a VGG-16 based front end for feature extraction and a back end containing multi-dilation blocks to generate density maps. Especially, a multi-dilation block has four branches which are used to collect features in different sizes. By using dilated convolutional operations, the multi-dilation block could obtain various features while the maximum kernel size is still 3 x 3. The experiments on two challenging crowd counting datasets, UCF_CC_50 and ShanghaiTech, have shown that the proposed MDNet achieves better performances than other state-of-the-art methods, with a lower mean absolute error and mean squared error. Comparing to the network with multi-scale blocks which adopt larger kernels to extract features, MDNet still gains competitive performances with fewer model parameters.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3338533.3366687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

With the growth of urban population, crowd analysis has become an important and necessary task in the field of computer vision. The goal of crowd counting, which is a subfield of crowd analysis, is to count the number of people in an image or a zone of a picture. Due to the problems like heavy occlusions, perspective and luminous intensity variations, it is still extremely challenging to achieve crowd counting. Recent state-of-the-art approaches are mainly designed with convolutional neural networks to generate density maps. In this work, Multi-Dilation Network (MDNet) is proposed to solve the problem of crowd counting in congested scenes. The MDNet is made up of two parts: a VGG-16 based front end for feature extraction and a back end containing multi-dilation blocks to generate density maps. Especially, a multi-dilation block has four branches which are used to collect features in different sizes. By using dilated convolutional operations, the multi-dilation block could obtain various features while the maximum kernel size is still 3 x 3. The experiments on two challenging crowd counting datasets, UCF_CC_50 and ShanghaiTech, have shown that the proposed MDNet achieves better performances than other state-of-the-art methods, with a lower mean absolute error and mean squared error. Comparing to the network with multi-scale blocks which adopt larger kernels to extract features, MDNet still gains competitive performances with fewer model parameters.

查看原文本刊更多论文

人群计数的多重扩张网络

随着城市人口的增长，人群分析已成为计算机视觉领域的一项重要而必要的任务。人群计数是人群分析的一个子领域，其目标是计算图像或图像区域中的人数。由于严重遮挡、透视和发光强度变化等问题，实现人群计数仍然极具挑战性。最近最先进的方法主要是用卷积神经网络来生成密度图。本文提出了多扩张网络(Multi-Dilation Network, MDNet)来解决拥挤场景中的人群计数问题。MDNet由两部分组成:基于VGG-16的前端用于特征提取，后端包含多膨胀块用于生成密度图。特别是，一个多膨胀块有四个分支，用于收集不同大小的特征。通过扩展卷积运算，多重扩展块可以在最大核大小仍为3 × 3的情况下获得各种特征。在UCF_CC_50和ShanghaiTech两个具有挑战性的人群统计数据集上进行的实验表明，所提出的MDNet方法具有较低的平均绝对误差和均方误差，比其他最先进的方法具有更好的性能。与采用更大内核提取特征的多尺度块网络相比，MDNet在模型参数更少的情况下仍然具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM Multimedia Asia

自引率

0.00%

发文量