{"title":"Multi-level feature splicing 3D network based on multi-task joint learning for video anomaly detection","authors":"Yang Li, Guoxiang Tong","doi":"10.1016/j.neucom.2025.129964","DOIUrl":null,"url":null,"abstract":"<div><div>In video anomaly detection research, deep learning is dedicated to identifying anomalous events accurately and efficiently. However, due to the scarcity and diversity of anomaly samples, previous methods have not adequately taken into account important information about location and timing. In addition, the overpowered generalization ability of the models leads to the fact that anomalies can also be well reconstructed or predicted. To address the above challenges, we propose a 3D network based on multi-level feature splicing with joint multi-task learning. The network is improved by the autoencoder (AE) as a backbone network. Firstly, we design a normal sample training task and a Gaussian noise task from a spatial perspective to enhance the reconstruction of positive samples. The frame-skipping task and the inverse sequence task of the video are designed from the temporal perspective to suppress the reconstruction ability of negative samples. Secondly, we use multi-level feature splicing in the encoding and decoding process to equip the network with the ability to explore sufficient information from the full scale. At the same time, we use an attention gating module to filter redundant features. The results show that our network is competitive with state-of-the-art methods. In terms of AUC, UCSD Ped2 achieves 99.3%, CUHK Avenue achieves 88.4%, and ShanghaiTech Campus achieves 74.2%.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"636 ","pages":"Article 129964"},"PeriodicalIF":5.5000,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225006368","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In video anomaly detection research, deep learning is dedicated to identifying anomalous events accurately and efficiently. However, due to the scarcity and diversity of anomaly samples, previous methods have not adequately taken into account important information about location and timing. In addition, the overpowered generalization ability of the models leads to the fact that anomalies can also be well reconstructed or predicted. To address the above challenges, we propose a 3D network based on multi-level feature splicing with joint multi-task learning. The network is improved by the autoencoder (AE) as a backbone network. Firstly, we design a normal sample training task and a Gaussian noise task from a spatial perspective to enhance the reconstruction of positive samples. The frame-skipping task and the inverse sequence task of the video are designed from the temporal perspective to suppress the reconstruction ability of negative samples. Secondly, we use multi-level feature splicing in the encoding and decoding process to equip the network with the ability to explore sufficient information from the full scale. At the same time, we use an attention gating module to filter redundant features. The results show that our network is competitive with state-of-the-art methods. In terms of AUC, UCSD Ped2 achieves 99.3%, CUHK Avenue achieves 88.4%, and ShanghaiTech Campus achieves 74.2%.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.