基于自适应特征学习CNN的人群场景行为识别

Aliyu Nuhu Shuaibu, A. Malik, I. Faye
{"title":"基于自适应特征学习CNN的人群场景行为识别","authors":"Aliyu Nuhu Shuaibu, A. Malik, I. Faye","doi":"10.1109/ICSIPA.2017.8120636","DOIUrl":null,"url":null,"abstract":"Learning and recognizing 3-dimension (3D) adaptive features are important for crowd scene understanding in video surveillance research. Deep learning architectures such as Convolutional Neural Networks (CNN) have recently shown much success in various computer vision applications. Existing approaches such as hand-crafted method and 2D-CNN architectures are widely used in adaptive feature representations on image data. However, learning dynamic and temporal features in 3D scale features in videos remains an open problem. In this study, we proposed a novel technique 3D-scale Convolutional Neural Network (3DS-CNN), based on the decomposition of 3D feature maps into 2D spatio and 2D temporal feature representations. Extensive experiments on hundreds of video scene were demonstrated on publicly available crowd datasets. Quantitative and qualitative evaluations indicate that the proposed model display superior performance when compared to baseline approaches. The mean average precision of 95.30% was recorded on WWW crowd dataset.","PeriodicalId":268112,"journal":{"name":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Adaptive feature learning CNN for behavior recognition in crowd scene\",\"authors\":\"Aliyu Nuhu Shuaibu, A. Malik, I. Faye\",\"doi\":\"10.1109/ICSIPA.2017.8120636\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learning and recognizing 3-dimension (3D) adaptive features are important for crowd scene understanding in video surveillance research. Deep learning architectures such as Convolutional Neural Networks (CNN) have recently shown much success in various computer vision applications. Existing approaches such as hand-crafted method and 2D-CNN architectures are widely used in adaptive feature representations on image data. However, learning dynamic and temporal features in 3D scale features in videos remains an open problem. In this study, we proposed a novel technique 3D-scale Convolutional Neural Network (3DS-CNN), based on the decomposition of 3D feature maps into 2D spatio and 2D temporal feature representations. Extensive experiments on hundreds of video scene were demonstrated on publicly available crowd datasets. Quantitative and qualitative evaluations indicate that the proposed model display superior performance when compared to baseline approaches. The mean average precision of 95.30% was recorded on WWW crowd dataset.\",\"PeriodicalId\":268112,\"journal\":{\"name\":\"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSIPA.2017.8120636\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSIPA.2017.8120636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

在视频监控研究中,学习和识别三维自适应特征对人群场景的理解具有重要意义。卷积神经网络(CNN)等深度学习架构最近在各种计算机视觉应用中取得了很大成功。现有的方法如手工方法和2D-CNN架构被广泛用于图像数据的自适应特征表示。然而,在视频的三维尺度特征中学习动态和时间特征仍然是一个悬而未决的问题。在这项研究中,我们提出了一种基于3D特征映射分解为二维空间和二维时间特征表示的3D尺度卷积神经网络(3DS-CNN)新技术。对数百个视频场景进行了广泛的实验,并在公开可用的人群数据集上进行了演示。定量和定性评价表明,与基线方法相比,所提出的模型显示出优越的性能。WWW人群数据集的平均准确率为95.30%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Adaptive feature learning CNN for behavior recognition in crowd scene
Learning and recognizing 3-dimension (3D) adaptive features are important for crowd scene understanding in video surveillance research. Deep learning architectures such as Convolutional Neural Networks (CNN) have recently shown much success in various computer vision applications. Existing approaches such as hand-crafted method and 2D-CNN architectures are widely used in adaptive feature representations on image data. However, learning dynamic and temporal features in 3D scale features in videos remains an open problem. In this study, we proposed a novel technique 3D-scale Convolutional Neural Network (3DS-CNN), based on the decomposition of 3D feature maps into 2D spatio and 2D temporal feature representations. Extensive experiments on hundreds of video scene were demonstrated on publicly available crowd datasets. Quantitative and qualitative evaluations indicate that the proposed model display superior performance when compared to baseline approaches. The mean average precision of 95.30% was recorded on WWW crowd dataset.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信