StreamVAD: A streaming framework with progressive context integration for multi-temporal scale video anomaly detection

IF 6.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-09-26 DOI:10.1016/j.neucom.2025.131669

Lijun Han, Gang Liang, Pengcheng Wang, Dingming Liu, Kui Zhao

{"title":"StreamVAD: A streaming framework with progressive context integration for multi-temporal scale video anomaly detection","authors":"Lijun Han, Gang Liang, Pengcheng Wang, Dingming Liu, Kui Zhao","doi":"10.1016/j.neucom.2025.131669","DOIUrl":null,"url":null,"abstract":"<div><div>Video anomaly detection (VAD) plays a crucial role in intelligent surveillance systems by identifying abnormal events in video streams. However, most existing methods either rely on isolated feature extraction—failing to model inter-action contextual relationships critical for complex anomaly recognition—or demand full-video processing via graph/hierarchical architectures, which incur high latency, computational burden, and parameter/memory inefficiency with depth. Lightweight designs mitigate costs but sacrifice temporal sensitivity through shallow networks and short-clip inputs, limiting detection of subtle or multi-scale anomalies in streaming scenarios. To address these challenges, we propose StreamVAD, a lightweight streaming anomaly detection framework that achieves low-latency, long-term temporal modeling with minimal computational overhead. A Key Clip Generator (KCG) filters redundant inputs in a streaming manner, allowing the model to focus on informative content while reducing computational cost. A progressive context integration (PCI) module incrementally expands the temporal receptive field by integrating historical context without full-sequence buffering, enabling efficient detection of complex long-term anomalies. Additionally, a multi-scale temporal selection (MTS) strategy dynamically adapts temporal resolution to capture both short- and long-term abnormalities. Extensive experiments on UCF-Crime, XD-Violence, and a supplemental long-term anomaly dataset demonstrate that StreamVAD achieves effective video anomaly detection with fewer parameters and lower latency. The code and dataset are available at <span><span>https://github.com/Han-lijun/StreamVAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131669"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225023410","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video anomaly detection (VAD) plays a crucial role in intelligent surveillance systems by identifying abnormal events in video streams. However, most existing methods either rely on isolated feature extraction—failing to model inter-action contextual relationships critical for complex anomaly recognition—or demand full-video processing via graph/hierarchical architectures, which incur high latency, computational burden, and parameter/memory inefficiency with depth. Lightweight designs mitigate costs but sacrifice temporal sensitivity through shallow networks and short-clip inputs, limiting detection of subtle or multi-scale anomalies in streaming scenarios. To address these challenges, we propose StreamVAD, a lightweight streaming anomaly detection framework that achieves low-latency, long-term temporal modeling with minimal computational overhead. A Key Clip Generator (KCG) filters redundant inputs in a streaming manner, allowing the model to focus on informative content while reducing computational cost. A progressive context integration (PCI) module incrementally expands the temporal receptive field by integrating historical context without full-sequence buffering, enabling efficient detection of complex long-term anomalies. Additionally, a multi-scale temporal selection (MTS) strategy dynamically adapts temporal resolution to capture both short- and long-term abnormalities. Extensive experiments on UCF-Crime, XD-Violence, and a supplemental long-term anomaly dataset demonstrate that StreamVAD achieves effective video anomaly detection with fewer parameters and lower latency. The code and dataset are available at https://github.com/Han-lijun/StreamVAD.

查看原文本刊更多论文

基于渐进式上下文集成的流媒体框架，用于多时间尺度视频异常检测

视频异常检测（VAD）通过识别视频流中的异常事件，在智能监控系统中起着至关重要的作用。然而，大多数现有方法要么依赖于孤立的特征提取——无法对复杂异常识别至关重要的交互上下文关系进行建模——要么需要通过图形/分层架构进行全视频处理，这会导致高延迟、计算负担和参数/内存效率低下。轻量化设计降低了成本，但牺牲了浅网络和短输入的时间灵敏度，限制了流场景中细微或多尺度异常的检测。为了应对这些挑战，我们提出了StreamVAD，这是一个轻量级的流异常检测框架，它以最小的计算开销实现低延迟、长期时间建模。Key Clip Generator （KCG）以流方式过滤冗余输入，允许模型专注于信息内容，同时降低计算成本。渐进式上下文整合（PCI）模块通过整合历史上下文，在没有全序列缓冲的情况下，逐步扩展时间接受野，从而有效地检测复杂的长期异常。此外，多尺度时间选择（MTS）策略动态地适应时间分辨率来捕获短期和长期异常。在UCF-Crime， XD-Violence和补充的长期异常数据集上的大量实验表明，StreamVAD以更少的参数和更低的延迟实现了有效的视频异常检测。代码和数据集可从https://github.com/Han-lijun/StreamVAD获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.