{"title":"Video Anomaly Detection via self-supervised and spatio-temporal proxy tasks learning","authors":"","doi":"10.1016/j.patcog.2024.111021","DOIUrl":null,"url":null,"abstract":"<div><p>Video Anomaly Detection (VAD) aims to identify events in videos that deviate from typical patterns. Given the scarcity of anomalous samples, previous research has primarily focused on learning regular patterns from datasets exclusively containing normal behaviors, and treating deviations from these patterns as anomalies. However, most of these methods are constrained by coarse-grained modeling approaches that renders them incapable of learning highly-discriminative features, which are necessary to effectively distinguish between the subtle differences between normal and abnormal behaviors. To better capture these features, we propose an innovative method. Initially, pseudo-anomalous samples for appearance and motion are generated through geometric transformations (2D rotations) and the scrambling of video sequences. Subsequently, a dual-branch network featuring spatio-temporal decoupling is proposed, in which the spatial and temporal branches each handle a specific proxy task. These tasks are designed to distinguish between normal and pseudo-anomalous samples, involving operations such as predicting patch-based 2D rotation angles and classifying video frame triplets as total-anomaly, left-anomaly, right-anomaly, and non-anomaly. Our approach employs an end-to-end training methodology, without relying on pre-trained models (except for the object detector). Evaluations on the UCSD Ped2, Avenue, and ShanghaiTech datasets show that our method achieved AUC scores of 99.1%, 91.9%, and 81.1%, respectively, demonstrating its effectiveness. The code is publicly accessible at the following link: <span><span>https://spatio-temporal-tasks</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324007726","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Video Anomaly Detection (VAD) aims to identify events in videos that deviate from typical patterns. Given the scarcity of anomalous samples, previous research has primarily focused on learning regular patterns from datasets exclusively containing normal behaviors, and treating deviations from these patterns as anomalies. However, most of these methods are constrained by coarse-grained modeling approaches that renders them incapable of learning highly-discriminative features, which are necessary to effectively distinguish between the subtle differences between normal and abnormal behaviors. To better capture these features, we propose an innovative method. Initially, pseudo-anomalous samples for appearance and motion are generated through geometric transformations (2D rotations) and the scrambling of video sequences. Subsequently, a dual-branch network featuring spatio-temporal decoupling is proposed, in which the spatial and temporal branches each handle a specific proxy task. These tasks are designed to distinguish between normal and pseudo-anomalous samples, involving operations such as predicting patch-based 2D rotation angles and classifying video frame triplets as total-anomaly, left-anomaly, right-anomaly, and non-anomaly. Our approach employs an end-to-end training methodology, without relying on pre-trained models (except for the object detector). Evaluations on the UCSD Ped2, Avenue, and ShanghaiTech datasets show that our method achieved AUC scores of 99.1%, 91.9%, and 81.1%, respectively, demonstrating its effectiveness. The code is publicly accessible at the following link: https://spatio-temporal-tasks.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.