Video Anomaly Detection via self-supervised and spatio-temporal proxy tasks learning

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
{"title":"Video Anomaly Detection via self-supervised and spatio-temporal proxy tasks learning","authors":"","doi":"10.1016/j.patcog.2024.111021","DOIUrl":null,"url":null,"abstract":"<div><p>Video Anomaly Detection (VAD) aims to identify events in videos that deviate from typical patterns. Given the scarcity of anomalous samples, previous research has primarily focused on learning regular patterns from datasets exclusively containing normal behaviors, and treating deviations from these patterns as anomalies. However, most of these methods are constrained by coarse-grained modeling approaches that renders them incapable of learning highly-discriminative features, which are necessary to effectively distinguish between the subtle differences between normal and abnormal behaviors. To better capture these features, we propose an innovative method. Initially, pseudo-anomalous samples for appearance and motion are generated through geometric transformations (2D rotations) and the scrambling of video sequences. Subsequently, a dual-branch network featuring spatio-temporal decoupling is proposed, in which the spatial and temporal branches each handle a specific proxy task. These tasks are designed to distinguish between normal and pseudo-anomalous samples, involving operations such as predicting patch-based 2D rotation angles and classifying video frame triplets as total-anomaly, left-anomaly, right-anomaly, and non-anomaly. Our approach employs an end-to-end training methodology, without relying on pre-trained models (except for the object detector). Evaluations on the UCSD Ped2, Avenue, and ShanghaiTech datasets show that our method achieved AUC scores of 99.1%, 91.9%, and 81.1%, respectively, demonstrating its effectiveness. The code is publicly accessible at the following link: <span><span>https://spatio-temporal-tasks</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":null,"pages":null},"PeriodicalIF":7.5000,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324007726","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Video Anomaly Detection (VAD) aims to identify events in videos that deviate from typical patterns. Given the scarcity of anomalous samples, previous research has primarily focused on learning regular patterns from datasets exclusively containing normal behaviors, and treating deviations from these patterns as anomalies. However, most of these methods are constrained by coarse-grained modeling approaches that renders them incapable of learning highly-discriminative features, which are necessary to effectively distinguish between the subtle differences between normal and abnormal behaviors. To better capture these features, we propose an innovative method. Initially, pseudo-anomalous samples for appearance and motion are generated through geometric transformations (2D rotations) and the scrambling of video sequences. Subsequently, a dual-branch network featuring spatio-temporal decoupling is proposed, in which the spatial and temporal branches each handle a specific proxy task. These tasks are designed to distinguish between normal and pseudo-anomalous samples, involving operations such as predicting patch-based 2D rotation angles and classifying video frame triplets as total-anomaly, left-anomaly, right-anomaly, and non-anomaly. Our approach employs an end-to-end training methodology, without relying on pre-trained models (except for the object detector). Evaluations on the UCSD Ped2, Avenue, and ShanghaiTech datasets show that our method achieved AUC scores of 99.1%, 91.9%, and 81.1%, respectively, demonstrating its effectiveness. The code is publicly accessible at the following link: https://spatio-temporal-tasks.

通过自监督和时空代理任务学习进行视频异常检测
视频异常检测(VAD)旨在识别视频中偏离典型模式的事件。鉴于异常样本的稀缺性,以往的研究主要侧重于从仅包含正常行为的数据集中学习常规模式,并将偏离这些模式的行为视为异常。然而,这些方法大多受到粗粒度建模方法的限制,无法学习高区分度特征,而这些特征是有效区分正常行为和异常行为之间细微差别的必要条件。为了更好地捕捉这些特征,我们提出了一种创新方法。首先,通过几何变换(二维旋转)和扰乱视频序列生成外观和运动的伪异常样本。随后,提出了一种以时空解耦为特征的双分支网络,其中空间和时间分支分别处理特定的代理任务。这些任务旨在区分正常样本和伪异常样本,涉及的操作包括预测基于补丁的二维旋转角度,以及将视频帧三胞胎分类为总异常、左异常、右异常和非异常。我们的方法采用端到端训练方法,不依赖预训练模型(物体检测器除外)。在 UCSD Ped2、Avenue 和 ShanghaiTech 数据集上进行的评估表明,我们的方法的 AUC 分数分别达到了 99.1%、91.9% 和 81.1%,证明了它的有效性。代码可通过以下链接公开访问:https://spatio-temporal-tasks。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信