基于时间分布深度cnn、rnn和基于注意力机制的实时敌对活动时空特征检测分析

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS) Pub Date : 2022-12-05 DOI:10.1109/IPAS55744.2022.10053001

Labib Ahmed Siddique, Rabita Junhai, Tanzim Reza, Salman Khan, Tanvir Rahman

{"title":"基于时间分布深度cnn、rnn和基于注意力机制的实时敌对活动时空特征检测分析","authors":"Labib Ahmed Siddique, Rabita Junhai, Tanzim Reza, Salman Khan, Tanvir Rahman","doi":"10.1109/IPAS55744.2022.10053001","DOIUrl":null,"url":null,"abstract":"Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm, 76.76% on CNN-Transformer and 80% on C3D.","PeriodicalId":322228,"journal":{"name":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","volume":"11303 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms\",\"authors\":\"Labib Ahmed Siddique, Rabita Junhai, Tanzim Reza, Salman Khan, Tanvir Rahman\",\"doi\":\"10.1109/IPAS55744.2022.10053001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm, 76.76% on CNN-Transformer and 80% on C3D.\",\"PeriodicalId\":322228,\"journal\":{\"name\":\"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)\",\"volume\":\"11303 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPAS55744.2022.10053001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPAS55744.2022.10053001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

通过闭路电视摄像机系统进行的实时视频监控已成为确保公共安全的必要条件，这是当今的优先事项。尽管闭路电视摄像机在提高安全性方面有很大帮助，但这些系统需要持续的人工交互和监控。为了根除这个问题，智能监控系统可以使用深度学习视频分类技术来建立，这可以帮助我们自动化监控系统，在暴力发生时检测暴力。在这项研究中，我们探索了深度学习视频分类技术，以检测正在发生的暴力事件。传统的图像分类技术在对视频进行分类时存在不足，因为它们试图对预测开始闪烁的每一帧进行单独分类。因此，许多研究者提出了在分类时考虑时空特征的视频分类技术。然而，将这些深度学习模型与通过姿态估计获得的骨架点和通过深度传感器获得的光流等方法一起部署在物联网环境中并不总是实用的。尽管这些技术确保了更高的精度分数，但它们的计算量更大。考虑到这些限制，我们实验了各种视频分类和动作识别技术，如ConvLSTM、LRCN(使用自定义CNN层和VGG-16作为特征提取器)、CNNTransformer和C3D。我们在ConvLSTM上实现了80%的测试准确率，在CNN-BiLSTM上达到83.33%，在VGG16-BiLstm上达到70%，在CNN-Transformer上达到76.76%，在C3D上达到80%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Analysis of Real-Time Hostile Activitiy Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms

Real-time video surveillance, through CCTV camera systems has become essential for ensuring public safety which is a priority today. Although CCTV cameras help a lot in increasing security, these systems require constant human interaction and monitoring. To eradicate this issue, intelligent surveillance systems can be built using deep learning video classification techniques that can help us automate surveillance systems to detect violence as it happens. In this research, we explore deep learning video classification techniques to detect violence as they are happening. Traditional image classification techniques fall short when it comes to classifying videos as they attempt to classify each frame separately for which the predictions start to flicker. Therefore, many researchers are coming up with video classification techniques that consider spatiotemporal features while classifying. However, deploying these deep learning models with methods such as skeleton points obtained through pose estimation and optical flow obtained through depth sensors, are not always practical in an IoT environment. Although these techniques ensure a higher accuracy score, they are computationally heavier. Keeping these constraints in mind, we experimented with various video classification and action recognition techniques such as ConvLSTM, LRCN (with both custom CNN layers and VGG-16 as feature extractor) CNNTransformer and C3D. We achieved a test accuracy of 80% on ConvLSTM, 83.33% on CNN-BiLSTM, 70% on VGG16-BiLstm, 76.76% on CNN-Transformer and 80% on C3D.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 5th International Conference on Image Processing Applications and Systems (IPAS)

自引率

0.00%

发文量