用于人群行为分类的新型多尺度暴力和公共集会数据集

Frontiers in Computer Science Pub Date : 2024-05-10 DOI:10.3389/fcomp.2024.1242690

Almiqdad Elzein, Emrah Basaran, Yin David Yang, Marwa Qaraqe

{"title":"用于人群行为分类的新型多尺度暴力和公共集会数据集","authors":"Almiqdad Elzein, Emrah Basaran, Yin David Yang, Marwa Qaraqe","doi":"10.3389/fcomp.2024.1242690","DOIUrl":null,"url":null,"abstract":"Dependable utilization of computer vision applications, such as smart surveillance, requires training deep learning networks on datasets that sufficiently represent the classes of interest. However, the bottleneck in many computer vision applications lies in the limited availability of adequate datasets. One particular application that is of great importance for the safety of cities and crowded areas is smart surveillance. Conventional surveillance methods are reactive and often ineffective in enable real-time action. However, smart surveillance is a key component of smart and proactive security in a smart city. Motivated by a smart city application which aims at the automatic identification of concerning events for alerting law-enforcement and governmental agencies, we craft a large video dataset that focuses on the distinction between small-scale violence, large-scale violence, peaceful gatherings, and natural events. This dataset classifies public events along two axes, the size of the crowd observed and the level of perceived violence in the crowd. We name this newly-built dataset the Multi-Scale Violence and Public Gathering (MSV-PG) dataset. The videos in the dataset go through several pre-processing steps to prepare them to be fed into a deep learning architecture. We conduct several experiments on the MSV-PG datasets using a ResNet3D, a Swin Transformer and an R(2 + 1)D architecture. The results achieved by these models when trained on the MSV-PG dataset, 88.37%, 89.76%, and 89.3%, respectively, indicate that the dataset is well-labeled and is rich enough to train deep learning models for automatic smart surveillance for diverse scenarios.","PeriodicalId":510141,"journal":{"name":"Frontiers in Computer Science","volume":" 3","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel multi-scale violence and public gathering dataset for crowd behavior classification\",\"authors\":\"Almiqdad Elzein, Emrah Basaran, Yin David Yang, Marwa Qaraqe\",\"doi\":\"10.3389/fcomp.2024.1242690\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dependable utilization of computer vision applications, such as smart surveillance, requires training deep learning networks on datasets that sufficiently represent the classes of interest. However, the bottleneck in many computer vision applications lies in the limited availability of adequate datasets. One particular application that is of great importance for the safety of cities and crowded areas is smart surveillance. Conventional surveillance methods are reactive and often ineffective in enable real-time action. However, smart surveillance is a key component of smart and proactive security in a smart city. Motivated by a smart city application which aims at the automatic identification of concerning events for alerting law-enforcement and governmental agencies, we craft a large video dataset that focuses on the distinction between small-scale violence, large-scale violence, peaceful gatherings, and natural events. This dataset classifies public events along two axes, the size of the crowd observed and the level of perceived violence in the crowd. We name this newly-built dataset the Multi-Scale Violence and Public Gathering (MSV-PG) dataset. The videos in the dataset go through several pre-processing steps to prepare them to be fed into a deep learning architecture. We conduct several experiments on the MSV-PG datasets using a ResNet3D, a Swin Transformer and an R(2 + 1)D architecture. The results achieved by these models when trained on the MSV-PG dataset, 88.37%, 89.76%, and 89.3%, respectively, indicate that the dataset is well-labeled and is rich enough to train deep learning models for automatic smart surveillance for diverse scenarios.\",\"PeriodicalId\":510141,\"journal\":{\"name\":\"Frontiers in Computer Science\",\"volume\":\" 3\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-05-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fcomp.2024.1242690\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2024.1242690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

要可靠地利用计算机视觉应用（如智能监控），需要在充分代表相关类别的数据集上训练深度学习网络。然而，许多计算机视觉应用的瓶颈在于可用的适当数据集有限。智能监控是对城市和人群密集地区的安全具有重要意义的一项特殊应用。传统的监控方法是被动的，往往不能有效地采取实时行动。然而，智能监控是智能城市中智能和主动安全的关键组成部分。智能城市应用旨在自动识别相关事件，以便向执法和政府机构发出警报，受此激励，我们制作了一个大型视频数据集，重点区分小规模暴力、大规模暴力、和平集会和自然事件。该数据集以观察到的人群规模和感知到的人群暴力程度这两条轴线对公共事件进行分类。我们将这个新建立的数据集命名为 "大规模暴力和公共集会（MSV-PG）数据集"。数据集中的视频需要经过多个预处理步骤，以便将其输入深度学习架构。我们使用 ResNet3D、Swin Transformer 和 R(2 + 1)D 架构在 MSV-PG 数据集上进行了多项实验。这些模型在MSV-PG数据集上的训练结果分别为88.37%、89.76%和89.3%，表明该数据集具有良好的标签，其丰富程度足以训练深度学习模型，用于不同场景的自动智能监控。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A novel multi-scale violence and public gathering dataset for crowd behavior classification

Dependable utilization of computer vision applications, such as smart surveillance, requires training deep learning networks on datasets that sufficiently represent the classes of interest. However, the bottleneck in many computer vision applications lies in the limited availability of adequate datasets. One particular application that is of great importance for the safety of cities and crowded areas is smart surveillance. Conventional surveillance methods are reactive and often ineffective in enable real-time action. However, smart surveillance is a key component of smart and proactive security in a smart city. Motivated by a smart city application which aims at the automatic identification of concerning events for alerting law-enforcement and governmental agencies, we craft a large video dataset that focuses on the distinction between small-scale violence, large-scale violence, peaceful gatherings, and natural events. This dataset classifies public events along two axes, the size of the crowd observed and the level of perceived violence in the crowd. We name this newly-built dataset the Multi-Scale Violence and Public Gathering (MSV-PG) dataset. The videos in the dataset go through several pre-processing steps to prepare them to be fed into a deep learning architecture. We conduct several experiments on the MSV-PG datasets using a ResNet3D, a Swin Transformer and an R(2 + 1)D architecture. The results achieved by these models when trained on the MSV-PG dataset, 88.37%, 89.76%, and 89.3%, respectively, indicate that the dataset is well-labeled and is rich enough to train deep learning models for automatic smart surveillance for diverse scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Computer Science

自引率

0.00%

发文量