{"title":"视频异常检测系统的无监督学习多尺度恢复扩散模型","authors":"Bo Li;Hongwei Ge;Yuxuan Liu;Guozhi Tang","doi":"10.1109/TII.2024.3493390","DOIUrl":null,"url":null,"abstract":"The rapid development of intelligent industry and smart city increases the number of surveillance devices, greatly enhancing the need for unsupervised automatic anomaly detection in real-time video surveillance, which uses raw data without laborious manual annotations. Existing video anomaly detection (VAD) methods encounter limitations when utilizing pretext tasks, such as reconstruction or prediction to identify abnormal events, as these tasks are not completely consistent and complementary with the essential objective of anomaly detection. Motivated by recent advances in diffusion models, we propose a multiscale recovery diffusion model, which relies on the proposed novel and effective pretext task named recovery to introduce the notion of generation speed. It utilizes critical step-by-step generation of diffusion probabilistic models in unsupervised anomaly detection scenarios. By incorporating a proposed multiscale spatial-temporal subtraction module, our model captures more detailed appearance and motion information of foreground objects without relying on other high-level pretrained models. Furthermore, an innovative push–pull loss further extends the disparity between normal and abnormal events through pseudolabels. We validate our model on five established benchmarks: UCSD Ped1, UCSD Ped2, CUHK Avenue, ShanghaiTech, and UCF-Crime, achieving frame-level area under the curves of 86.01%, 99.23%, 92.35%, 82.49%, and 74.79%, respectively, surpassing other state-of-the-art unsupervised VAD methods.","PeriodicalId":13301,"journal":{"name":"IEEE Transactions on Industrial Informatics","volume":"21 3","pages":"2104-2113"},"PeriodicalIF":9.9000,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multiscale Recovery Diffusion Model With Unsupervised Learning for Video Anomaly Detection System\",\"authors\":\"Bo Li;Hongwei Ge;Yuxuan Liu;Guozhi Tang\",\"doi\":\"10.1109/TII.2024.3493390\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The rapid development of intelligent industry and smart city increases the number of surveillance devices, greatly enhancing the need for unsupervised automatic anomaly detection in real-time video surveillance, which uses raw data without laborious manual annotations. Existing video anomaly detection (VAD) methods encounter limitations when utilizing pretext tasks, such as reconstruction or prediction to identify abnormal events, as these tasks are not completely consistent and complementary with the essential objective of anomaly detection. Motivated by recent advances in diffusion models, we propose a multiscale recovery diffusion model, which relies on the proposed novel and effective pretext task named recovery to introduce the notion of generation speed. It utilizes critical step-by-step generation of diffusion probabilistic models in unsupervised anomaly detection scenarios. By incorporating a proposed multiscale spatial-temporal subtraction module, our model captures more detailed appearance and motion information of foreground objects without relying on other high-level pretrained models. Furthermore, an innovative push–pull loss further extends the disparity between normal and abnormal events through pseudolabels. We validate our model on five established benchmarks: UCSD Ped1, UCSD Ped2, CUHK Avenue, ShanghaiTech, and UCF-Crime, achieving frame-level area under the curves of 86.01%, 99.23%, 92.35%, 82.49%, and 74.79%, respectively, surpassing other state-of-the-art unsupervised VAD methods.\",\"PeriodicalId\":13301,\"journal\":{\"name\":\"IEEE Transactions on Industrial Informatics\",\"volume\":\"21 3\",\"pages\":\"2104-2113\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2024-12-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Industrial Informatics\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10774158/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Industrial Informatics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10774158/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
Multiscale Recovery Diffusion Model With Unsupervised Learning for Video Anomaly Detection System
The rapid development of intelligent industry and smart city increases the number of surveillance devices, greatly enhancing the need for unsupervised automatic anomaly detection in real-time video surveillance, which uses raw data without laborious manual annotations. Existing video anomaly detection (VAD) methods encounter limitations when utilizing pretext tasks, such as reconstruction or prediction to identify abnormal events, as these tasks are not completely consistent and complementary with the essential objective of anomaly detection. Motivated by recent advances in diffusion models, we propose a multiscale recovery diffusion model, which relies on the proposed novel and effective pretext task named recovery to introduce the notion of generation speed. It utilizes critical step-by-step generation of diffusion probabilistic models in unsupervised anomaly detection scenarios. By incorporating a proposed multiscale spatial-temporal subtraction module, our model captures more detailed appearance and motion information of foreground objects without relying on other high-level pretrained models. Furthermore, an innovative push–pull loss further extends the disparity between normal and abnormal events through pseudolabels. We validate our model on five established benchmarks: UCSD Ped1, UCSD Ped2, CUHK Avenue, ShanghaiTech, and UCF-Crime, achieving frame-level area under the curves of 86.01%, 99.23%, 92.35%, 82.49%, and 74.79%, respectively, surpassing other state-of-the-art unsupervised VAD methods.
期刊介绍:
The IEEE Transactions on Industrial Informatics is a multidisciplinary journal dedicated to publishing technical papers that connect theory with practical applications of informatics in industrial settings. It focuses on the utilization of information in intelligent, distributed, and agile industrial automation and control systems. The scope includes topics such as knowledge-based and AI-enhanced automation, intelligent computer control systems, flexible and collaborative manufacturing, industrial informatics in software-defined vehicles and robotics, computer vision, industrial cyber-physical and industrial IoT systems, real-time and networked embedded systems, security in industrial processes, industrial communications, systems interoperability, and human-machine interaction.