{"title":"Violence Detection With Two-Stream Neural Network Based on C3D","authors":"zanzan Lu, Xu Xia, Hongrun Wu, Chen Yang","doi":"10.4018/ijcini.287601","DOIUrl":null,"url":null,"abstract":"In recent years, violence detection has gradually turned into an important research area in computer vision, and have proposed many models with high accuracy. However, the unsatisfactory generalization ability of these methods over different datasets. In this paper, the authors propose a violence detection method based on C3D two-stream network for spatiotemporal features. Firstly, the authors preprocess the video data of RGB stream and optical stream respectively. Secondly, the authors feed the data into two C3D networks to extract features from the RGB flow and the optical flow respectively. Third, the authors fuse the features extracted by the two networks to obtain a final prediction result. To testify the performance of the proposed model, four different datasets (two public datasets and two self-built datasets) are selected in this paper. The experimental results show that our model has good generalization ability compared to state-of-the-art methods, since it not only has good ability on large-scale datasets, but also performs well on small-scale datasets.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijcini.287601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In recent years, violence detection has gradually turned into an important research area in computer vision, and have proposed many models with high accuracy. However, the unsatisfactory generalization ability of these methods over different datasets. In this paper, the authors propose a violence detection method based on C3D two-stream network for spatiotemporal features. Firstly, the authors preprocess the video data of RGB stream and optical stream respectively. Secondly, the authors feed the data into two C3D networks to extract features from the RGB flow and the optical flow respectively. Third, the authors fuse the features extracted by the two networks to obtain a final prediction result. To testify the performance of the proposed model, four different datasets (two public datasets and two self-built datasets) are selected in this paper. The experimental results show that our model has good generalization ability compared to state-of-the-art methods, since it not only has good ability on large-scale datasets, but also performs well on small-scale datasets.