{"title":"逆向扩散用于少镜头场景自适应视频异常检测","authors":"Yumna Zahid , Christine Zarges , Bernie Tiddeman , Jungong Han","doi":"10.1016/j.neucom.2024.128796","DOIUrl":null,"url":null,"abstract":"<div><div>Few-shot anomaly detection for video surveillance is challenging due to the diverse nature of target domains. Existing methodologies treat it as a one-class classification problem, training on a reduced sample of nominal scenes. The focus is on either reconstructive or predictive frame methodologies to learn a manifold against which outliers can be detected during inference. We posit that the quality of image reconstruction or future frame prediction is inherently important in identifying anomalous pixels in video frames. In this paper, we enhance the image synthesis and mode coverage for video anomaly detection (VAD) by integrating a <em>Denoising Diffusion</em> model with a future frame prediction model. Our novel VAD pipeline includes a <em>Generative Adversarial Network</em> combined with denoising diffusion to learn the underlying non-anomalous data distribution and generate in one-step high fidelity future-frame samples. We further regularize the image reconstruction with perceptual quality metrics such as <em>Multi-scale Structural Similarity Index Measure</em> and <em>Peak Signal-to-Noise Ratio</em>, ensuring high-quality output under few episodic training iterations. Extensive experiments demonstrate that our method outperforms state-of-the-art techniques across multiple benchmarks, validating that high-quality image synthesis in frame prediction leads to robust anomaly detection in videos.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Adversarial diffusion for few-shot scene adaptive video anomaly detection\",\"authors\":\"Yumna Zahid , Christine Zarges , Bernie Tiddeman , Jungong Han\",\"doi\":\"10.1016/j.neucom.2024.128796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Few-shot anomaly detection for video surveillance is challenging due to the diverse nature of target domains. Existing methodologies treat it as a one-class classification problem, training on a reduced sample of nominal scenes. The focus is on either reconstructive or predictive frame methodologies to learn a manifold against which outliers can be detected during inference. We posit that the quality of image reconstruction or future frame prediction is inherently important in identifying anomalous pixels in video frames. In this paper, we enhance the image synthesis and mode coverage for video anomaly detection (VAD) by integrating a <em>Denoising Diffusion</em> model with a future frame prediction model. Our novel VAD pipeline includes a <em>Generative Adversarial Network</em> combined with denoising diffusion to learn the underlying non-anomalous data distribution and generate in one-step high fidelity future-frame samples. We further regularize the image reconstruction with perceptual quality metrics such as <em>Multi-scale Structural Similarity Index Measure</em> and <em>Peak Signal-to-Noise Ratio</em>, ensuring high-quality output under few episodic training iterations. Extensive experiments demonstrate that our method outperforms state-of-the-art techniques across multiple benchmarks, validating that high-quality image synthesis in frame prediction leads to robust anomaly detection in videos.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224015674\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015674","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
摘要
由于目标领域的多样性,视频监控的少镜头异常检测具有挑战性。现有的方法将其视为单类分类问题,在减少的标称场景样本上进行训练。重点在于重建或预测帧方法,以学习一个流形,在推理过程中可根据该流形检测异常值。我们认为,图像重建或未来帧预测的质量对于识别视频帧中的异常像素至关重要。在本文中,我们通过整合去噪扩散模型和未来帧预测模型,提高了视频异常检测(VAD)的图像合成和模式覆盖率。我们新颖的 VAD 管道包括一个生成对抗网络(Generative Adversarial Network),该网络与去噪扩散相结合,可学习底层非异常数据分布,并一步生成高保真的未来帧样本。我们还利用多尺度结构相似性指数测量和峰值信噪比等感知质量指标对图像重建进行了进一步的规范化处理,确保在少量偶发训练迭代的情况下实现高质量的输出。广泛的实验证明,我们的方法在多个基准测试中的表现优于最先进的技术,从而验证了在帧预测中进行高质量图像合成可实现稳健的视频异常检测。
Adversarial diffusion for few-shot scene adaptive video anomaly detection
Few-shot anomaly detection for video surveillance is challenging due to the diverse nature of target domains. Existing methodologies treat it as a one-class classification problem, training on a reduced sample of nominal scenes. The focus is on either reconstructive or predictive frame methodologies to learn a manifold against which outliers can be detected during inference. We posit that the quality of image reconstruction or future frame prediction is inherently important in identifying anomalous pixels in video frames. In this paper, we enhance the image synthesis and mode coverage for video anomaly detection (VAD) by integrating a Denoising Diffusion model with a future frame prediction model. Our novel VAD pipeline includes a Generative Adversarial Network combined with denoising diffusion to learn the underlying non-anomalous data distribution and generate in one-step high fidelity future-frame samples. We further regularize the image reconstruction with perceptual quality metrics such as Multi-scale Structural Similarity Index Measure and Peak Signal-to-Noise Ratio, ensuring high-quality output under few episodic training iterations. Extensive experiments demonstrate that our method outperforms state-of-the-art techniques across multiple benchmarks, validating that high-quality image synthesis in frame prediction leads to robust anomaly detection in videos.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.