Hierarchical Scene Normality-Binding Modeling for Anomaly Detection in Surveillance Videos

Proceedings of the 30th ACM International Conference on Multimedia Pub Date : 2022-10-10 DOI:10.1145/3503161.3548199

Qianyue Bao, F. Liu, Yang Liu, Licheng Jiao, Xu Liu, Lingling Li

{"title":"Hierarchical Scene Normality-Binding Modeling for Anomaly Detection in Surveillance Videos","authors":"Qianyue Bao, F. Liu, Yang Liu, Licheng Jiao, Xu Liu, Lingling Li","doi":"10.1145/3503161.3548199","DOIUrl":null,"url":null,"abstract":"Anomaly detection in surveillance videos is an important topic in the multimedia community, which requires efficient scene context extraction and the capture of temporal information as a basis for decision. From the perspective of hierarchical modeling, we parse the surveillance scene from global to local and propose a Hierarchical Scene Normality-Binding Modeling framework (HSNBM) to handle anomaly detection. For the static background hierarchy, we design a Region Clustering-driven Multi-task Memory Autoencoder (RCM-MemAE), which can simultaneously perform region segmentation and scene reconstruction. The normal prototypes of each local region are stored, and the frame reconstruction error is subsequently amplified by global memory augmentation. For the dynamic foreground object hierarchy, we employ a Scene-Object Binding Frame Prediction module (SOB-FP) to bind all foreground objects in the frame with the prototypes stored in the background hierarchy according their positions, thus fully exploit the normality relationship between foreground and background. The bound features are then fed into the decoder to predict the future movement of the objects. With the binding mechanism between foreground and background, HSNBM effectively integrates the \"reconstruction\" and \"prediction\" tasks and builds a semantic bridge between the two hierarchies. Finally, HSNBM fuses the anomaly scores of the two hierarchies to make a comprehensive decision. Extensive empirical studies on three standard video anomaly detection datasets demonstrate the effectiveness of the proposed HSNBM framework.","PeriodicalId":412792,"journal":{"name":"Proceedings of the 30th ACM International Conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3503161.3548199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Anomaly detection in surveillance videos is an important topic in the multimedia community, which requires efficient scene context extraction and the capture of temporal information as a basis for decision. From the perspective of hierarchical modeling, we parse the surveillance scene from global to local and propose a Hierarchical Scene Normality-Binding Modeling framework (HSNBM) to handle anomaly detection. For the static background hierarchy, we design a Region Clustering-driven Multi-task Memory Autoencoder (RCM-MemAE), which can simultaneously perform region segmentation and scene reconstruction. The normal prototypes of each local region are stored, and the frame reconstruction error is subsequently amplified by global memory augmentation. For the dynamic foreground object hierarchy, we employ a Scene-Object Binding Frame Prediction module (SOB-FP) to bind all foreground objects in the frame with the prototypes stored in the background hierarchy according their positions, thus fully exploit the normality relationship between foreground and background. The bound features are then fed into the decoder to predict the future movement of the objects. With the binding mechanism between foreground and background, HSNBM effectively integrates the "reconstruction" and "prediction" tasks and builds a semantic bridge between the two hierarchies. Finally, HSNBM fuses the anomaly scores of the two hierarchies to make a comprehensive decision. Extensive empirical studies on three standard video anomaly detection datasets demonstrate the effectiveness of the proposed HSNBM framework.

查看原文本刊更多论文

用于监控视频异常检测的分层场景正态性绑定模型

监控视频异常检测是多媒体领域的一个重要课题，它需要高效的场景上下文提取和时间信息的捕获作为决策的基础。从分层建模的角度，从全局到局部对监控场景进行解析，提出了一种分层场景常态绑定建模框架(HSNBM)来处理异常检测。针对静态背景层次，设计了一种区域聚类驱动的多任务记忆自编码器(RCM-MemAE)，可以同时进行区域分割和场景重构。存储每个局部区域的正常原型，然后通过全局内存增强放大帧重建错误。对于动态前景对象层次，我们采用场景-对象绑定框架预测模块(SOB-FP)，根据框架中所有前景对象的位置与背景层次中存储的原型进行绑定，充分利用前景与背景之间的正态性关系。然后将绑定的特征输入到解码器中，以预测物体的未来运动。HSNBM通过前景与背景的绑定机制，有效地整合了“重建”与“预测”任务，在两个层次之间搭建了语义桥梁。最后，HSNBM融合两个层次的异常得分，进行综合决策。对三个标准视频异常检测数据集的大量实证研究证明了所提出的HSNBM框架的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 30th ACM International Conference on Multimedia

自引率

0.00%

发文量