人群计数的多重扩张网络

Proceedings of the ACM Multimedia Asia Pub Date : 2019-12-15 DOI:10.1145/3338533.3366687

Shuheng Wang, Hanli Wang, Qinyu Li

{"title":"人群计数的多重扩张网络","authors":"Shuheng Wang, Hanli Wang, Qinyu Li","doi":"10.1145/3338533.3366687","DOIUrl":null,"url":null,"abstract":"With the growth of urban population, crowd analysis has become an important and necessary task in the field of computer vision. The goal of crowd counting, which is a subfield of crowd analysis, is to count the number of people in an image or a zone of a picture. Due to the problems like heavy occlusions, perspective and luminous intensity variations, it is still extremely challenging to achieve crowd counting. Recent state-of-the-art approaches are mainly designed with convolutional neural networks to generate density maps. In this work, Multi-Dilation Network (MDNet) is proposed to solve the problem of crowd counting in congested scenes. The MDNet is made up of two parts: a VGG-16 based front end for feature extraction and a back end containing multi-dilation blocks to generate density maps. Especially, a multi-dilation block has four branches which are used to collect features in different sizes. By using dilated convolutional operations, the multi-dilation block could obtain various features while the maximum kernel size is still 3 x 3. The experiments on two challenging crowd counting datasets, UCF_CC_50 and ShanghaiTech, have shown that the proposed MDNet achieves better performances than other state-of-the-art methods, with a lower mean absolute error and mean squared error. Comparing to the network with multi-scale blocks which adopt larger kernels to extract features, MDNet still gains competitive performances with fewer model parameters.","PeriodicalId":273086,"journal":{"name":"Proceedings of the ACM Multimedia Asia","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Multi-Dilation Network for Crowd Counting\",\"authors\":\"Shuheng Wang, Hanli Wang, Qinyu Li\",\"doi\":\"10.1145/3338533.3366687\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the growth of urban population, crowd analysis has become an important and necessary task in the field of computer vision. The goal of crowd counting, which is a subfield of crowd analysis, is to count the number of people in an image or a zone of a picture. Due to the problems like heavy occlusions, perspective and luminous intensity variations, it is still extremely challenging to achieve crowd counting. Recent state-of-the-art approaches are mainly designed with convolutional neural networks to generate density maps. In this work, Multi-Dilation Network (MDNet) is proposed to solve the problem of crowd counting in congested scenes. The MDNet is made up of two parts: a VGG-16 based front end for feature extraction and a back end containing multi-dilation blocks to generate density maps. Especially, a multi-dilation block has four branches which are used to collect features in different sizes. By using dilated convolutional operations, the multi-dilation block could obtain various features while the maximum kernel size is still 3 x 3. The experiments on two challenging crowd counting datasets, UCF_CC_50 and ShanghaiTech, have shown that the proposed MDNet achieves better performances than other state-of-the-art methods, with a lower mean absolute error and mean squared error. Comparing to the network with multi-scale blocks which adopt larger kernels to extract features, MDNet still gains competitive performances with fewer model parameters.\",\"PeriodicalId\":273086,\"journal\":{\"name\":\"Proceedings of the ACM Multimedia Asia\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Multimedia Asia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3338533.3366687\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Multimedia Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3338533.3366687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

随着城市人口的增长，人群分析已成为计算机视觉领域的一项重要而必要的任务。人群计数是人群分析的一个子领域，其目标是计算图像或图像区域中的人数。由于严重遮挡、透视和发光强度变化等问题，实现人群计数仍然极具挑战性。最近最先进的方法主要是用卷积神经网络来生成密度图。本文提出了多扩张网络(Multi-Dilation Network, MDNet)来解决拥挤场景中的人群计数问题。MDNet由两部分组成:基于VGG-16的前端用于特征提取，后端包含多膨胀块用于生成密度图。特别是，一个多膨胀块有四个分支，用于收集不同大小的特征。通过扩展卷积运算，多重扩展块可以在最大核大小仍为3 × 3的情况下获得各种特征。在UCF_CC_50和ShanghaiTech两个具有挑战性的人群统计数据集上进行的实验表明，所提出的MDNet方法具有较低的平均绝对误差和均方误差，比其他最先进的方法具有更好的性能。与采用更大内核提取特征的多尺度块网络相比，MDNet在模型参数更少的情况下仍然具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multi-Dilation Network for Crowd Counting

With the growth of urban population, crowd analysis has become an important and necessary task in the field of computer vision. The goal of crowd counting, which is a subfield of crowd analysis, is to count the number of people in an image or a zone of a picture. Due to the problems like heavy occlusions, perspective and luminous intensity variations, it is still extremely challenging to achieve crowd counting. Recent state-of-the-art approaches are mainly designed with convolutional neural networks to generate density maps. In this work, Multi-Dilation Network (MDNet) is proposed to solve the problem of crowd counting in congested scenes. The MDNet is made up of two parts: a VGG-16 based front end for feature extraction and a back end containing multi-dilation blocks to generate density maps. Especially, a multi-dilation block has four branches which are used to collect features in different sizes. By using dilated convolutional operations, the multi-dilation block could obtain various features while the maximum kernel size is still 3 x 3. The experiments on two challenging crowd counting datasets, UCF_CC_50 and ShanghaiTech, have shown that the proposed MDNet achieves better performances than other state-of-the-art methods, with a lower mean absolute error and mean squared error. Comparing to the network with multi-scale blocks which adopt larger kernels to extract features, MDNet still gains competitive performances with fewer model parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM Multimedia Asia

自引率

0.00%

发文量