StreamVAD: A streaming framework with progressive context integration for multi-temporal scale video anomaly detection

IF 6.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Lijun Han, Gang Liang, Pengcheng Wang, Dingming Liu, Kui Zhao
{"title":"StreamVAD: A streaming framework with progressive context integration for multi-temporal scale video anomaly detection","authors":"Lijun Han,&nbsp;Gang Liang,&nbsp;Pengcheng Wang,&nbsp;Dingming Liu,&nbsp;Kui Zhao","doi":"10.1016/j.neucom.2025.131669","DOIUrl":null,"url":null,"abstract":"<div><div>Video anomaly detection (VAD) plays a crucial role in intelligent surveillance systems by identifying abnormal events in video streams. However, most existing methods either rely on isolated feature extraction—failing to model inter-action contextual relationships critical for complex anomaly recognition—or demand full-video processing via graph/hierarchical architectures, which incur high latency, computational burden, and parameter/memory inefficiency with depth. Lightweight designs mitigate costs but sacrifice temporal sensitivity through shallow networks and short-clip inputs, limiting detection of subtle or multi-scale anomalies in streaming scenarios. To address these challenges, we propose StreamVAD, a lightweight streaming anomaly detection framework that achieves low-latency, long-term temporal modeling with minimal computational overhead. A Key Clip Generator (KCG) filters redundant inputs in a streaming manner, allowing the model to focus on informative content while reducing computational cost. A progressive context integration (PCI) module incrementally expands the temporal receptive field by integrating historical context without full-sequence buffering, enabling efficient detection of complex long-term anomalies. Additionally, a multi-scale temporal selection (MTS) strategy dynamically adapts temporal resolution to capture both short- and long-term abnormalities. Extensive experiments on UCF-Crime, XD-Violence, and a supplemental long-term anomaly dataset demonstrate that StreamVAD achieves effective video anomaly detection with fewer parameters and lower latency. The code and dataset are available at <span><span>https://github.com/Han-lijun/StreamVAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"658 ","pages":"Article 131669"},"PeriodicalIF":6.5000,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225023410","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Video anomaly detection (VAD) plays a crucial role in intelligent surveillance systems by identifying abnormal events in video streams. However, most existing methods either rely on isolated feature extraction—failing to model inter-action contextual relationships critical for complex anomaly recognition—or demand full-video processing via graph/hierarchical architectures, which incur high latency, computational burden, and parameter/memory inefficiency with depth. Lightweight designs mitigate costs but sacrifice temporal sensitivity through shallow networks and short-clip inputs, limiting detection of subtle or multi-scale anomalies in streaming scenarios. To address these challenges, we propose StreamVAD, a lightweight streaming anomaly detection framework that achieves low-latency, long-term temporal modeling with minimal computational overhead. A Key Clip Generator (KCG) filters redundant inputs in a streaming manner, allowing the model to focus on informative content while reducing computational cost. A progressive context integration (PCI) module incrementally expands the temporal receptive field by integrating historical context without full-sequence buffering, enabling efficient detection of complex long-term anomalies. Additionally, a multi-scale temporal selection (MTS) strategy dynamically adapts temporal resolution to capture both short- and long-term abnormalities. Extensive experiments on UCF-Crime, XD-Violence, and a supplemental long-term anomaly dataset demonstrate that StreamVAD achieves effective video anomaly detection with fewer parameters and lower latency. The code and dataset are available at https://github.com/Han-lijun/StreamVAD.
基于渐进式上下文集成的流媒体框架,用于多时间尺度视频异常检测
视频异常检测(VAD)通过识别视频流中的异常事件,在智能监控系统中起着至关重要的作用。然而,大多数现有方法要么依赖于孤立的特征提取——无法对复杂异常识别至关重要的交互上下文关系进行建模——要么需要通过图形/分层架构进行全视频处理,这会导致高延迟、计算负担和参数/内存效率低下。轻量化设计降低了成本,但牺牲了浅网络和短输入的时间灵敏度,限制了流场景中细微或多尺度异常的检测。为了应对这些挑战,我们提出了StreamVAD,这是一个轻量级的流异常检测框架,它以最小的计算开销实现低延迟、长期时间建模。Key Clip Generator (KCG)以流方式过滤冗余输入,允许模型专注于信息内容,同时降低计算成本。渐进式上下文整合(PCI)模块通过整合历史上下文,在没有全序列缓冲的情况下,逐步扩展时间接受野,从而有效地检测复杂的长期异常。此外,多尺度时间选择(MTS)策略动态地适应时间分辨率来捕获短期和长期异常。在UCF-Crime, XD-Violence和补充的长期异常数据集上的大量实验表明,StreamVAD以更少的参数和更低的延迟实现了有效的视频异常检测。代码和数据集可从https://github.com/Han-lijun/StreamVAD获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Neurocomputing
Neurocomputing 工程技术-计算机:人工智能
CiteScore
13.10
自引率
10.00%
发文量
1382
审稿时长
70 days
期刊介绍: Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信