用于视频中多功能人群分析的实时时空深度可分离 CNN

IF 0.8 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Santosh Kumar Tripathy, Poonkuntran Shanmugam
{"title":"用于视频中多功能人群分析的实时时空深度可分离 CNN","authors":"Santosh Kumar Tripathy, Poonkuntran Shanmugam","doi":"10.1142/s0219467825500470","DOIUrl":null,"url":null,"abstract":"Crowd behavior prediction (CBP) and crowd counting (CC) are the essential functions of vision-based crowd analysis (CA), which play a crucial role in controlling crowd disasters. The CA using different models for the CBP and the CC will increase computational overheads and have synchronization issues. The state-of-the-art approaches utilized deep convolutional architectures to exploit spatial-temporal features to accomplish the objective, but such models suffer from computational complexities during convolution operations. Thus, to sort out the issues as mentioned earlier, this paper develops a single deep model which performs two functionalities of CA: CBP and CC. The proposed model uses multilayers of depth-wise separable CNN (DSCNN) to extract fine-grained spatial-temporal features from the scene. The DSCNN can minimize the number of matrix multiplications during convolution operation compared to traditional CNN. Further, the existing datasets are available to accomplish the single functionality of CA. In contrast, the proposed model needs a dual-tasking CA dataset which should provide the ground-truth labels for CBP and CC. Thus, a dual functionality CA dataset is prepared using a benchmark crowd behavior dataset, i.e. MED. Around 41[Formula: see text]000 frames have been manually annotated to obtain ground-truth crowd count values. This paper also demonstrates an experiment on the proposed multi-functional dataset and outperforms the state-of-the-art methods regarding several performance metrics. In addition, the proposed model processes each test frame at 3.40 milliseconds, and thus is easily applicable in real-time.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":"61 3","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Real-Time Spatial-Temporal Depth Separable CNN for Multi-Functional Crowd Analysis in Videos\",\"authors\":\"Santosh Kumar Tripathy, Poonkuntran Shanmugam\",\"doi\":\"10.1142/s0219467825500470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Crowd behavior prediction (CBP) and crowd counting (CC) are the essential functions of vision-based crowd analysis (CA), which play a crucial role in controlling crowd disasters. The CA using different models for the CBP and the CC will increase computational overheads and have synchronization issues. The state-of-the-art approaches utilized deep convolutional architectures to exploit spatial-temporal features to accomplish the objective, but such models suffer from computational complexities during convolution operations. Thus, to sort out the issues as mentioned earlier, this paper develops a single deep model which performs two functionalities of CA: CBP and CC. The proposed model uses multilayers of depth-wise separable CNN (DSCNN) to extract fine-grained spatial-temporal features from the scene. The DSCNN can minimize the number of matrix multiplications during convolution operation compared to traditional CNN. Further, the existing datasets are available to accomplish the single functionality of CA. In contrast, the proposed model needs a dual-tasking CA dataset which should provide the ground-truth labels for CBP and CC. Thus, a dual functionality CA dataset is prepared using a benchmark crowd behavior dataset, i.e. MED. Around 41[Formula: see text]000 frames have been manually annotated to obtain ground-truth crowd count values. This paper also demonstrates an experiment on the proposed multi-functional dataset and outperforms the state-of-the-art methods regarding several performance metrics. In addition, the proposed model processes each test frame at 3.40 milliseconds, and thus is easily applicable in real-time.\",\"PeriodicalId\":44688,\"journal\":{\"name\":\"International Journal of Image and Graphics\",\"volume\":\"61 3\",\"pages\":\"\"},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2023-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Image and Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0219467825500470\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Image and Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467825500470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

人群行为预测(CBP)和人群计数(CC)是基于视觉的人群分析(CA)的基本功能,在控制人群灾难中发挥着至关重要的作用。对 CBP 和 CC 使用不同模型的 CA 会增加计算开销并产生同步问题。最先进的方法利用深度卷积架构来利用时空特征来实现目标,但这类模型在卷积操作过程中存在计算复杂性问题。因此,为了解决前面提到的问题,本文开发了一种单一的深度模型,可实现 CA 的两种功能:CBP 和 CC。所提出的模型使用多层深度可分离 CNN(DSCNN)从场景中提取细粒度时空特征。与传统的 CNN 相比,DSCNN 可以最大限度地减少卷积操作中的矩阵乘法次数。此外,现有的数据集可以实现 CA 的单一功能。相比之下,所提出的模型需要一个双任务 CA 数据集,为 CBP 和 CC 提供地面真实标签。因此,我们使用基准人群行为数据集(即 MED)准备了一个双功能 CA 数据集。约 41[公式:见正文]000帧图像已被人工标注,以获得真实的人群数量值。本文还对所提出的多功能数据集进行了实验演示,在多个性能指标上都优于最先进的方法。此外,所提出的模型处理每个测试帧的时间仅为 3.40 毫秒,因此易于实时应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Real-Time Spatial-Temporal Depth Separable CNN for Multi-Functional Crowd Analysis in Videos
Crowd behavior prediction (CBP) and crowd counting (CC) are the essential functions of vision-based crowd analysis (CA), which play a crucial role in controlling crowd disasters. The CA using different models for the CBP and the CC will increase computational overheads and have synchronization issues. The state-of-the-art approaches utilized deep convolutional architectures to exploit spatial-temporal features to accomplish the objective, but such models suffer from computational complexities during convolution operations. Thus, to sort out the issues as mentioned earlier, this paper develops a single deep model which performs two functionalities of CA: CBP and CC. The proposed model uses multilayers of depth-wise separable CNN (DSCNN) to extract fine-grained spatial-temporal features from the scene. The DSCNN can minimize the number of matrix multiplications during convolution operation compared to traditional CNN. Further, the existing datasets are available to accomplish the single functionality of CA. In contrast, the proposed model needs a dual-tasking CA dataset which should provide the ground-truth labels for CBP and CC. Thus, a dual functionality CA dataset is prepared using a benchmark crowd behavior dataset, i.e. MED. Around 41[Formula: see text]000 frames have been manually annotated to obtain ground-truth crowd count values. This paper also demonstrates an experiment on the proposed multi-functional dataset and outperforms the state-of-the-art methods regarding several performance metrics. In addition, the proposed model processes each test frame at 3.40 milliseconds, and thus is easily applicable in real-time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Image and Graphics
International Journal of Image and Graphics COMPUTER SCIENCE, SOFTWARE ENGINEERING-
CiteScore
2.40
自引率
18.80%
发文量
67
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信