{"title":"A Deep Fusion Network for Violence Recognition","authors":"Zhimin Song, Wuwei Zhang, Dongyue Chen","doi":"10.1109/iip57348.2022.00029","DOIUrl":null,"url":null,"abstract":"With the rise of smart city construction, the importance of violence recognition based on surveillance video is increasingly prominent. For the violence in a surveillance video, it is challenging to define specific violent behavior, and the number of participants engaging in violence is unknown while the involvement of each participant is different. Undoubtedly, these barriers are unfavorable for the consistency of the video frame and video clip labels. In this paper, we propose a new framework of violence recognition: a frame selection strategy based on local differential brightness is designed for the accurate selection of violence frames; meanwhile, a deep fusion network P-VFN is designed, targeting to avoid the mismatch between frames and video labels; finally, various motion image detection algorithms are compared to explore the substitutability of the optical flow method, which aims to better the unsatisfactory real-time performance of current optical flow calculation. Experimental results on three challenging benchmark datasets demonstrate that the proposed approach outperforms many state- of-the-art violence recognition models. Furthermore, to compensate for the lack of the current public dataset in the real surveillance scene, we use real surveillance cameras to capture and produce a largescale real violence dataset, which also attributes to a better performance.","PeriodicalId":412907,"journal":{"name":"2022 4th International Conference on Intelligent Information Processing (IIP)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Intelligent Information Processing (IIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iip57348.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rise of smart city construction, the importance of violence recognition based on surveillance video is increasingly prominent. For the violence in a surveillance video, it is challenging to define specific violent behavior, and the number of participants engaging in violence is unknown while the involvement of each participant is different. Undoubtedly, these barriers are unfavorable for the consistency of the video frame and video clip labels. In this paper, we propose a new framework of violence recognition: a frame selection strategy based on local differential brightness is designed for the accurate selection of violence frames; meanwhile, a deep fusion network P-VFN is designed, targeting to avoid the mismatch between frames and video labels; finally, various motion image detection algorithms are compared to explore the substitutability of the optical flow method, which aims to better the unsatisfactory real-time performance of current optical flow calculation. Experimental results on three challenging benchmark datasets demonstrate that the proposed approach outperforms many state- of-the-art violence recognition models. Furthermore, to compensate for the lack of the current public dataset in the real surveillance scene, we use real surveillance cameras to capture and produce a largescale real violence dataset, which also attributes to a better performance.