{"title":"Anomaly detection in surveillance videos using Transformer with margin learning","authors":"Dicong Wang, Kaijun Wu","doi":"10.1007/s00530-024-01443-4","DOIUrl":null,"url":null,"abstract":"<p>Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance learning (MIL) problem. However, quite a few of these methods tend to primarily concentrate on time periods when anomalies occur discernibly. To recognize anomalous events, they rely solely on detecting significant changes in appearance or motion, ignoring the temporal completeness or continuity that anomalous events possess by nature. In addition, they also disregard the subtle correlations at the transitional boundaries between normal and abnormal states. Therefore, we propose a weakly supervised learning approach based on Transformer with margin learning for video anomaly detection. Specifically, our network effectively captures temporal changes around the occurrence of anomalies by utilizing the benefits of Transformer blocks, which are adept at capturing long-range dependencies in anomalous events. Secondly, to tackle challenging cases, i.e., normal events with high similarity to anomalous events, we employed a hard score memory. The purpose of this memory is to store the anomaly scores of hard samples, enabling iterative optimization training on those hard instances. Additionally, to bolster the discriminative capability of the model at the score level, we utilize pseudo-labels for anomalous events to provide supplementary support in detection. Experiments were conducted on two large-scale datasets, namely the ShanghaiTech dataset and the UCF-Crime dataset, and they achieved highly favorable results. The results of the experiments demonstrate that the proposed method is sensitive to anomalous events while performing competitively against state-of-the-art methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"49 1","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00530-024-01443-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance learning (MIL) problem. However, quite a few of these methods tend to primarily concentrate on time periods when anomalies occur discernibly. To recognize anomalous events, they rely solely on detecting significant changes in appearance or motion, ignoring the temporal completeness or continuity that anomalous events possess by nature. In addition, they also disregard the subtle correlations at the transitional boundaries between normal and abnormal states. Therefore, we propose a weakly supervised learning approach based on Transformer with margin learning for video anomaly detection. Specifically, our network effectively captures temporal changes around the occurrence of anomalies by utilizing the benefits of Transformer blocks, which are adept at capturing long-range dependencies in anomalous events. Secondly, to tackle challenging cases, i.e., normal events with high similarity to anomalous events, we employed a hard score memory. The purpose of this memory is to store the anomaly scores of hard samples, enabling iterative optimization training on those hard instances. Additionally, to bolster the discriminative capability of the model at the score level, we utilize pseudo-labels for anomalous events to provide supplementary support in detection. Experiments were conducted on two large-scale datasets, namely the ShanghaiTech dataset and the UCF-Crime dataset, and they achieved highly favorable results. The results of the experiments demonstrate that the proposed method is sensitive to anomalous events while performing competitively against state-of-the-art methods.
期刊介绍:
This journal details innovative research ideas, emerging technologies, state-of-the-art methods and tools in all aspects of multimedia computing, communication, storage, and applications. It features theoretical, experimental, and survey articles.