Self-supervised learning video anomaly detection based on time interval prediction and noise classification

IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Yishuo Liu , Chuanxu Wang , Qingyang Yang , Lanxiao Li , Binghui Wang
{"title":"Self-supervised learning video anomaly detection based on time interval prediction and noise classification","authors":"Yishuo Liu ,&nbsp;Chuanxu Wang ,&nbsp;Qingyang Yang ,&nbsp;Lanxiao Li ,&nbsp;Binghui Wang","doi":"10.1016/j.patcog.2025.112198","DOIUrl":null,"url":null,"abstract":"<div><div>Video Anomaly Detection (VAD) aims to automatically identify anomalous events in videos that significantly deviate from normal behavioral patterns. Self-supervised learning motivates models to learn effective features from unlabeled data by designing proxy tasks. However, existing approaches often rely on coarse-grained modeling, focusing mainly on global sequence order or holistic scene structures, which may limit their ability to capture subtle motion changes or localized anomalies. Therefore, this paper proposes a self-supervised learning framework combined with fine-grained spatio-temporal proxy tasks to extract key features more accurately. For the temporal branch, we design a time interval prediction task: given a fixed middle frame and randomly sampled frames from both sides, the model predicts their temporal intervals relative to the center frame, thereby modeling the dynamic patterns of behavior. To enhance temporal modeling capabilities, we introduce a multi-head self-attention mechanism to capture inter-frame dependencies in the input sequence. The spatial branch employs a noise classification task inspired by diffusion models, where varying levels of noise are added to image patches, and the model predicts the corresponding noise levels. This encourages learning of local appearance features and patch-level sensitivity to perturbations. Our method is trained in an end-to-end manner and does not rely on pre-trained models. Experiments on three benchmark datasets demonstrate stable performance: the method achieves AUC scores of 98.6 % on UCSD Ped2, 91.7 % on CUHK Avenue, and 83.7 % on ShanghaiTech. These results suggest that the proposed approach can generalize well across different scenes, perspectives, and types of anomalous behavior.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"171 ","pages":"Article 112198"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325008593","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Video Anomaly Detection (VAD) aims to automatically identify anomalous events in videos that significantly deviate from normal behavioral patterns. Self-supervised learning motivates models to learn effective features from unlabeled data by designing proxy tasks. However, existing approaches often rely on coarse-grained modeling, focusing mainly on global sequence order or holistic scene structures, which may limit their ability to capture subtle motion changes or localized anomalies. Therefore, this paper proposes a self-supervised learning framework combined with fine-grained spatio-temporal proxy tasks to extract key features more accurately. For the temporal branch, we design a time interval prediction task: given a fixed middle frame and randomly sampled frames from both sides, the model predicts their temporal intervals relative to the center frame, thereby modeling the dynamic patterns of behavior. To enhance temporal modeling capabilities, we introduce a multi-head self-attention mechanism to capture inter-frame dependencies in the input sequence. The spatial branch employs a noise classification task inspired by diffusion models, where varying levels of noise are added to image patches, and the model predicts the corresponding noise levels. This encourages learning of local appearance features and patch-level sensitivity to perturbations. Our method is trained in an end-to-end manner and does not rely on pre-trained models. Experiments on three benchmark datasets demonstrate stable performance: the method achieves AUC scores of 98.6 % on UCSD Ped2, 91.7 % on CUHK Avenue, and 83.7 % on ShanghaiTech. These results suggest that the proposed approach can generalize well across different scenes, perspectives, and types of anomalous behavior.
基于时间间隔预测和噪声分类的自监督学习视频异常检测
视频异常检测(VAD)旨在自动识别视频中明显偏离正常行为模式的异常事件。自监督学习激励模型通过设计代理任务从未标记数据中学习有效特征。然而,现有的方法往往依赖于粗粒度建模,主要关注全局序列顺序或整体场景结构,这可能限制了它们捕捉细微运动变化或局部异常的能力。因此,本文提出了一种结合细粒度时空代理任务的自监督学习框架,以更准确地提取关键特征。对于时间分支,我们设计了一个时间间隔预测任务:给定固定的中间帧和随机采样的两侧帧,模型预测它们相对于中心帧的时间间隔,从而建模动态行为模式。为了增强时间建模能力,我们引入了多头自注意机制来捕获输入序列中的帧间依赖关系。空间分支采用受扩散模型启发的噪声分类任务,将不同级别的噪声添加到图像补丁中,然后模型预测相应的噪声级别。这鼓励学习局部外观特征和补丁级对扰动的敏感性。我们的方法以端到端方式进行训练,不依赖于预训练的模型。在三个基准数据集上的实验表明,该方法在UCSD Ped2上的AUC得分为98.6%,在中大大道上的AUC得分为91.7%,在上海科技上的AUC得分为83.7%。这些结果表明,所提出的方法可以很好地推广到不同的场景、视角和异常行为类型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Pattern Recognition
Pattern Recognition 工程技术-工程:电子与电气
CiteScore
14.40
自引率
16.20%
发文量
683
审稿时长
5.6 months
期刊介绍: The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信