{"title":"基于时空的上下文融合视频异常检测","authors":"Chao Hu, Weibin Qiu, Weijie Wu, Liqiang Zhu","doi":"10.1109/PRMVIA58252.2023.00037","DOIUrl":null,"url":null,"abstract":"Video anomaly detection (VAD) detects target objects such as people and vehicles to discover abnormal events in videos. There are abundant spatio-temporal context information in different objects of videos. Most existing methods pay more attention to temporal context than spatial context in VAD. The spatial context information represents the relationship between the detection target and surrounding targets. Anomaly detection makes a lot of sense. To this end, a video anomaly detection algorithm based on target spatio-temporal context fusion is proposed. Firstly, the target in the video frame is extracted through the target detection network to reduce background interference. Then the optical flow map of two adjacent frames is calculated. Motion features are used multiple targets in the video frame to construct spatial context simultaneously, re-encoding the target appearance and motion features, and finally reconstructing the above features through the spatiotemporal dual-stream network, and using the reconstruction error to represent the abnormal score. The algorithm achieves frame-level AUCs of 98.5% on UCSDped2 and 86.3% on Avenue datasets. On UCSDped2 dataset, the spatio-temporal dual-stream network improves frames by 5.1% and 0.3%, respectively, compared to the temporal and spatial stream networks. After using spatial context encoding, the frame-level AUC is enhanced by 1%, which verifies the method’s effectiveness.","PeriodicalId":221346,"journal":{"name":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Spatio-Temporal-based Context Fusion for Video Anomaly Detection\",\"authors\":\"Chao Hu, Weibin Qiu, Weijie Wu, Liqiang Zhu\",\"doi\":\"10.1109/PRMVIA58252.2023.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Video anomaly detection (VAD) detects target objects such as people and vehicles to discover abnormal events in videos. There are abundant spatio-temporal context information in different objects of videos. Most existing methods pay more attention to temporal context than spatial context in VAD. The spatial context information represents the relationship between the detection target and surrounding targets. Anomaly detection makes a lot of sense. To this end, a video anomaly detection algorithm based on target spatio-temporal context fusion is proposed. Firstly, the target in the video frame is extracted through the target detection network to reduce background interference. Then the optical flow map of two adjacent frames is calculated. Motion features are used multiple targets in the video frame to construct spatial context simultaneously, re-encoding the target appearance and motion features, and finally reconstructing the above features through the spatiotemporal dual-stream network, and using the reconstruction error to represent the abnormal score. The algorithm achieves frame-level AUCs of 98.5% on UCSDped2 and 86.3% on Avenue datasets. On UCSDped2 dataset, the spatio-temporal dual-stream network improves frames by 5.1% and 0.3%, respectively, compared to the temporal and spatial stream networks. After using spatial context encoding, the frame-level AUC is enhanced by 1%, which verifies the method’s effectiveness.\",\"PeriodicalId\":221346,\"journal\":{\"name\":\"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-05-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRMVIA58252.2023.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Pattern Recognition, Machine Vision and Intelligent Algorithms (PRMVIA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRMVIA58252.2023.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Spatio-Temporal-based Context Fusion for Video Anomaly Detection
Video anomaly detection (VAD) detects target objects such as people and vehicles to discover abnormal events in videos. There are abundant spatio-temporal context information in different objects of videos. Most existing methods pay more attention to temporal context than spatial context in VAD. The spatial context information represents the relationship between the detection target and surrounding targets. Anomaly detection makes a lot of sense. To this end, a video anomaly detection algorithm based on target spatio-temporal context fusion is proposed. Firstly, the target in the video frame is extracted through the target detection network to reduce background interference. Then the optical flow map of two adjacent frames is calculated. Motion features are used multiple targets in the video frame to construct spatial context simultaneously, re-encoding the target appearance and motion features, and finally reconstructing the above features through the spatiotemporal dual-stream network, and using the reconstruction error to represent the abnormal score. The algorithm achieves frame-level AUCs of 98.5% on UCSDped2 and 86.3% on Avenue datasets. On UCSDped2 dataset, the spatio-temporal dual-stream network improves frames by 5.1% and 0.3%, respectively, compared to the temporal and spatial stream networks. After using spatial context encoding, the frame-level AUC is enhanced by 1%, which verifies the method’s effectiveness.