基于ConvLSTM的视频游戏元数据暴力检测

2021 IEEE 9th International Conference on Serious Games and Applications for Health(SeGAH) Pub Date : 2021-08-04 DOI:10.1109/SEGAH52098.2021.9551853

Helena A. Correia, José Henrique Brito

{"title":"基于ConvLSTM的视频游戏元数据暴力检测","authors":"Helena A. Correia, José Henrique Brito","doi":"10.1109/SEGAH52098.2021.9551853","DOIUrl":null,"url":null,"abstract":"The automatic detection of violent situations is relevant to monitor exposure to violence, both in the context of the analysis of real video and video generated in virtual environments, namely in simulated scenarios or virtual/mixed reality applications, such as serious games. In this paper, we propose a deep neural network to identify violent videos, with an approach capable of working in real video and synthetic video. An efficient detector of the 2D pose together with a multiple person tracker is used to extract motion features from the video sequence that will be fed directly into the proposed network. The proposed convolutional neural network is a recurrent neural network for a Spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions, which enables the analysis of local motion in the video. By stacking multiple ConvLSTM layers and forming an encoding-forecasting structure, we obtain a network model for the violence detection problem and more general spatiotemporal sequence forecasting problems. The inputs for the model correspond to sequences of keypoints extracted from the skeletons present in each frame originating an output corresponding to the classification of the video. The model was trained and evaluated with an innovative dataset that contains violent videos from a popular fighting game and non-violent videos related to people's daily lives. Comparison of the results obtained with the state-of-the-art techniques revealed the promising capability of the proposed method in recognizing violent videos with 100% precision, although it is not as robust as other datasets. Conv-LSTM units are shown to be an effective means for modelling and predicting video sequences.","PeriodicalId":189731,"journal":{"name":"2021 IEEE 9th International Conference on Serious Games and Applications for Health(SeGAH)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Violence detection in video game metadata using ConvLSTM\",\"authors\":\"Helena A. Correia, José Henrique Brito\",\"doi\":\"10.1109/SEGAH52098.2021.9551853\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The automatic detection of violent situations is relevant to monitor exposure to violence, both in the context of the analysis of real video and video generated in virtual environments, namely in simulated scenarios or virtual/mixed reality applications, such as serious games. In this paper, we propose a deep neural network to identify violent videos, with an approach capable of working in real video and synthetic video. An efficient detector of the 2D pose together with a multiple person tracker is used to extract motion features from the video sequence that will be fed directly into the proposed network. The proposed convolutional neural network is a recurrent neural network for a Spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions, which enables the analysis of local motion in the video. By stacking multiple ConvLSTM layers and forming an encoding-forecasting structure, we obtain a network model for the violence detection problem and more general spatiotemporal sequence forecasting problems. The inputs for the model correspond to sequences of keypoints extracted from the skeletons present in each frame originating an output corresponding to the classification of the video. The model was trained and evaluated with an innovative dataset that contains violent videos from a popular fighting game and non-violent videos related to people's daily lives. Comparison of the results obtained with the state-of-the-art techniques revealed the promising capability of the proposed method in recognizing violent videos with 100% precision, although it is not as robust as other datasets. Conv-LSTM units are shown to be an effective means for modelling and predicting video sequences.\",\"PeriodicalId\":189731,\"journal\":{\"name\":\"2021 IEEE 9th International Conference on Serious Games and Applications for Health(SeGAH)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 9th International Conference on Serious Games and Applications for Health(SeGAH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SEGAH52098.2021.9551853\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 9th International Conference on Serious Games and Applications for Health(SeGAH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SEGAH52098.2021.9551853","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

暴力情况的自动检测与监控暴力暴露有关，无论是在分析真实视频的背景下，还是在虚拟环境中生成的视频，即在模拟场景或虚拟/混合现实应用中，如严肃游戏中。在本文中，我们提出了一种深度神经网络来识别暴力视频，该方法能够在真实视频和合成视频中工作。一个有效的二维姿态检测器与多人跟踪器一起用于从视频序列中提取运动特征，这些特征将直接馈送到所提出的网络中。所提出的卷积神经网络是一种用于时空预测的递归神经网络，它在输入到状态和状态到状态的转换中都具有卷积结构，从而能够分析视频中的局部运动。通过叠加多个ConvLSTM层并形成一个编码-预测结构，我们获得了一个用于暴力检测问题和更一般的时空序列预测问题的网络模型。模型的输入对应于从每帧中存在的骨架中提取的关键点序列，产生对应于视频分类的输出。该模型使用一个创新的数据集进行训练和评估，该数据集包含来自流行格斗游戏的暴力视频和与人们日常生活相关的非暴力视频。与最先进的技术获得的结果进行比较，揭示了所提出的方法在识别暴力视频方面具有100%精度的潜力，尽管它不如其他数据集健壮。卷积- lstm单元是建模和预测视频序列的有效手段。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Violence detection in video game metadata using ConvLSTM

The automatic detection of violent situations is relevant to monitor exposure to violence, both in the context of the analysis of real video and video generated in virtual environments, namely in simulated scenarios or virtual/mixed reality applications, such as serious games. In this paper, we propose a deep neural network to identify violent videos, with an approach capable of working in real video and synthetic video. An efficient detector of the 2D pose together with a multiple person tracker is used to extract motion features from the video sequence that will be fed directly into the proposed network. The proposed convolutional neural network is a recurrent neural network for a Spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions, which enables the analysis of local motion in the video. By stacking multiple ConvLSTM layers and forming an encoding-forecasting structure, we obtain a network model for the violence detection problem and more general spatiotemporal sequence forecasting problems. The inputs for the model correspond to sequences of keypoints extracted from the skeletons present in each frame originating an output corresponding to the classification of the video. The model was trained and evaluated with an innovative dataset that contains violent videos from a popular fighting game and non-violent videos related to people's daily lives. Comparison of the results obtained with the state-of-the-art techniques revealed the promising capability of the proposed method in recognizing violent videos with 100% precision, although it is not as robust as other datasets. Conv-LSTM units are shown to be an effective means for modelling and predicting video sequences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 9th International Conference on Serious Games and Applications for Health(SeGAH)

自引率

0.00%

发文量