{"title":"A Novel Spatiotemporal Attention Convolutional Neural Network for Video Crowd Counting","authors":"Shangjie Zhang, Yuelei Xiao","doi":"10.1145/3573942.3574069","DOIUrl":null,"url":null,"abstract":"For most existing crowd counting methods, image-based methods are still used for crowd counting in the presence of video datasets, ignoring powerful time information. Thus, a novel spatiotemporal attention convolutional neural network is proposed to solve the video-based crowd counting problem. Firstly, the first ten layers of VGG-16 are used as the backbone network to extract features, and a single layer of ConvLSTM captures the time correlation of adjacent frames. Then, stacked dilated convolutional layers are used to enlarge the receptive field without increasing the computational load. Finally, a convolutional block attention module is introduced with the adaptive refinement of feature mapping. Its ability to emphasize or suppress information in the channel and spatial dimensions aids information dissemination. Experimental results on the two reference datasets (i.e., Mall and WorldExpo'10) show that the proposed method further improves the accuracy of crowd counting and is superior to the other existing crowd counting methods.","PeriodicalId":103293,"journal":{"name":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","volume":"68 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3573942.3574069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
For most existing crowd counting methods, image-based methods are still used for crowd counting in the presence of video datasets, ignoring powerful time information. Thus, a novel spatiotemporal attention convolutional neural network is proposed to solve the video-based crowd counting problem. Firstly, the first ten layers of VGG-16 are used as the backbone network to extract features, and a single layer of ConvLSTM captures the time correlation of adjacent frames. Then, stacked dilated convolutional layers are used to enlarge the receptive field without increasing the computational load. Finally, a convolutional block attention module is introduced with the adaptive refinement of feature mapping. Its ability to emphasize or suppress information in the channel and spatial dimensions aids information dissemination. Experimental results on the two reference datasets (i.e., Mall and WorldExpo'10) show that the proposed method further improves the accuracy of crowd counting and is superior to the other existing crowd counting methods.