Lijun Han, Gang Liang, Pengcheng Wang, Dingming Liu, Kui Zhao
{"title":"StreamVAD: A streaming framework with progressive context integration for multi-temporal scale video anomaly detection","authors":"Lijun Han, Gang Liang, Pengcheng Wang, Dingming Liu, Kui Zhao","doi":"10.1016/j.neucom.2025.131669","DOIUrl":null,"url":null,"abstract":"<div><div>Video anomaly detection (VAD) plays a crucial role in intelligent surveillance systems by identifying abnormal events in video streams. However, most existing methods either rely on isolated feature extraction—failing to model inter-action contextual relationships critical for complex anomaly recognition—or demand full-video processing via graph/hierarchical architectures, which incur high latency, computational burden, and parameter/memory inefficiency with depth. Lightweight designs mitigate costs but sacrifice temporal sensitivity through shallow networks and short-clip inputs, limiting detection of subtle or multi-scale anomalies in streaming scenarios. To address these challenges, we propose StreamVAD, a lightweight streaming anomaly detection framework that achieves low-latency, long-term temporal modeling with minimal computational overhead. A Key Clip Generator (KCG) filters redundant inputs in a streaming manner, allowing the model to focus on informative content while reducing computational cost. A progressive context integration (PCI) module incrementally expands the temporal receptive field by integrating historical context without full-sequence buffering, enabling efficient detection of complex long-term anomalies. Additionally, a multi-scale temporal selection (MTS) strategy dynamically adapts temporal resolution to capture both short- and long-term abnormalities. Extensive experiments on UCF-Crime, XD-Violence, and a supplemental long-term anomaly dataset demonstrate that StreamVAD achieves effective video anomaly detection with fewer parameters and lower latency. The code and dataset are available at <span><span>https://github.com/Han-lijun/StreamVAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131669"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225023410","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Video anomaly detection (VAD) plays a crucial role in intelligent surveillance systems by identifying abnormal events in video streams. However, most existing methods either rely on isolated feature extraction—failing to model inter-action contextual relationships critical for complex anomaly recognition—or demand full-video processing via graph/hierarchical architectures, which incur high latency, computational burden, and parameter/memory inefficiency with depth. Lightweight designs mitigate costs but sacrifice temporal sensitivity through shallow networks and short-clip inputs, limiting detection of subtle or multi-scale anomalies in streaming scenarios. To address these challenges, we propose StreamVAD, a lightweight streaming anomaly detection framework that achieves low-latency, long-term temporal modeling with minimal computational overhead. A Key Clip Generator (KCG) filters redundant inputs in a streaming manner, allowing the model to focus on informative content while reducing computational cost. A progressive context integration (PCI) module incrementally expands the temporal receptive field by integrating historical context without full-sequence buffering, enabling efficient detection of complex long-term anomalies. Additionally, a multi-scale temporal selection (MTS) strategy dynamically adapts temporal resolution to capture both short- and long-term abnormalities. Extensive experiments on UCF-Crime, XD-Violence, and a supplemental long-term anomaly dataset demonstrate that StreamVAD achieves effective video anomaly detection with fewer parameters and lower latency. The code and dataset are available at https://github.com/Han-lijun/StreamVAD.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.