Y. Liu, Di Li, Wei Zhu, Dingkang Yang, Jing Liu, Liang Song
{"title":"MSN-net: Multi-Scale Normality Network for Video Anomaly Detection","authors":"Y. Liu, Di Li, Wei Zhu, Dingkang Yang, Jing Liu, Liang Song","doi":"10.1109/ICASSP49357.2023.10097052","DOIUrl":null,"url":null,"abstract":"Existing unsupervised video anomaly detection methods often suffer from performance degradation due to the overgeneralization of deep models. In this paper, we propose a simple yet effective Multi-Scale Normality network (MSN-net) that uses hierarchical memories to learn multi-level prototypical spatial-temporal patterns of normal events. Specifically, the hierarchical memory module interacts with the encoder through the reading and writing operations during the training phase, preserving multi-scale normality in three separate memory pools. Then, the decoder decodes the features rewritten by the memorized normality to predict future frames so that its ability to predict anomalies is diminished. Experimental results show that MSN-net performs comparably to the state-of-the-art methods, and extension analysis demonstrates the effectiveness of multi-scale normality learning.","PeriodicalId":113072,"journal":{"name":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"72 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP49357.2023.10097052","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Existing unsupervised video anomaly detection methods often suffer from performance degradation due to the overgeneralization of deep models. In this paper, we propose a simple yet effective Multi-Scale Normality network (MSN-net) that uses hierarchical memories to learn multi-level prototypical spatial-temporal patterns of normal events. Specifically, the hierarchical memory module interacts with the encoder through the reading and writing operations during the training phase, preserving multi-scale normality in three separate memory pools. Then, the decoder decodes the features rewritten by the memorized normality to predict future frames so that its ability to predict anomalies is diminished. Experimental results show that MSN-net performs comparably to the state-of-the-art methods, and extension analysis demonstrates the effectiveness of multi-scale normality learning.