{"title":"基于帧记忆库和解耦非对称卷积的视频异常检测","authors":"Min Zhao, Chuanxu Wang, Jiajiong Li, Zitai Jiang","doi":"10.1117/1.jei.33.5.053006","DOIUrl":null,"url":null,"abstract":"Video anomaly detection (VAD) is essential for monitoring systems. The prediction-based methods identify anomalies by comparing differences between the predicted and real frames. We propose an unsupervised VAD method based on frame memory bank (FMB) and decoupled asymmetric convolution (DAConv), which addresses three problems encountered with auto-encoders (AE) in VAD: (1) how to mitigate the noise resulting from jittering between frames, which is ignored; (2) how to alleviate the insufficient utilization of temporal information by traditional two-dimensional (2D) convolution and the burden for more computing resources in three-dimensional (3D) convolution; and (3) how to make full use of normal data to improve the reliability of anomaly discrimination. Specifically, we initially design a separate network to calibrate video frames within the dataset. Second, we design DAConv to extract features from the video, addressing the absence of temporal dimension information in 2D convolutions and the high computational complexity of 3D convolutions. Concurrently, the interval-frame mechanism mitigates the problem of information redundancy caused by data reuse. Finally, we embed an FMB to store features of normal events, amplifying the contrast between normal and abnormal frames. We conduct extensive experiments on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, achieving AUC values of 98.7%, 90.4%, and 74.8%, respectively, which fully demonstrates the rationality and effectiveness of the proposed method.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"105 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Video anomaly detection based on frame memory bank and decoupled asymmetric convolutions\",\"authors\":\"Min Zhao, Chuanxu Wang, Jiajiong Li, Zitai Jiang\",\"doi\":\"10.1117/1.jei.33.5.053006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video anomaly detection (VAD) is essential for monitoring systems. The prediction-based methods identify anomalies by comparing differences between the predicted and real frames. We propose an unsupervised VAD method based on frame memory bank (FMB) and decoupled asymmetric convolution (DAConv), which addresses three problems encountered with auto-encoders (AE) in VAD: (1) how to mitigate the noise resulting from jittering between frames, which is ignored; (2) how to alleviate the insufficient utilization of temporal information by traditional two-dimensional (2D) convolution and the burden for more computing resources in three-dimensional (3D) convolution; and (3) how to make full use of normal data to improve the reliability of anomaly discrimination. Specifically, we initially design a separate network to calibrate video frames within the dataset. Second, we design DAConv to extract features from the video, addressing the absence of temporal dimension information in 2D convolutions and the high computational complexity of 3D convolutions. Concurrently, the interval-frame mechanism mitigates the problem of information redundancy caused by data reuse. Finally, we embed an FMB to store features of normal events, amplifying the contrast between normal and abnormal frames. We conduct extensive experiments on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, achieving AUC values of 98.7%, 90.4%, and 74.8%, respectively, which fully demonstrates the rationality and effectiveness of the proposed method.\",\"PeriodicalId\":54843,\"journal\":{\"name\":\"Journal of Electronic Imaging\",\"volume\":\"105 1\",\"pages\":\"\"},\"PeriodicalIF\":1.0000,\"publicationDate\":\"2024-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Electronic Imaging\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1117/1.jei.33.5.053006\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.5.053006","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Video anomaly detection based on frame memory bank and decoupled asymmetric convolutions
Video anomaly detection (VAD) is essential for monitoring systems. The prediction-based methods identify anomalies by comparing differences between the predicted and real frames. We propose an unsupervised VAD method based on frame memory bank (FMB) and decoupled asymmetric convolution (DAConv), which addresses three problems encountered with auto-encoders (AE) in VAD: (1) how to mitigate the noise resulting from jittering between frames, which is ignored; (2) how to alleviate the insufficient utilization of temporal information by traditional two-dimensional (2D) convolution and the burden for more computing resources in three-dimensional (3D) convolution; and (3) how to make full use of normal data to improve the reliability of anomaly discrimination. Specifically, we initially design a separate network to calibrate video frames within the dataset. Second, we design DAConv to extract features from the video, addressing the absence of temporal dimension information in 2D convolutions and the high computational complexity of 3D convolutions. Concurrently, the interval-frame mechanism mitigates the problem of information redundancy caused by data reuse. Finally, we embed an FMB to store features of normal events, amplifying the contrast between normal and abnormal frames. We conduct extensive experiments on the UCSD Ped2, CUHK Avenue, and ShanghaiTech datasets, achieving AUC values of 98.7%, 90.4%, and 74.8%, respectively, which fully demonstrates the rationality and effectiveness of the proposed method.
期刊介绍:
The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.