{"title":"CF2M-Net:视频异常检测的跨特征融合和内存约束网络","authors":"Qiming Ma , Chengyou Wang , Xiao Zhou","doi":"10.1016/j.ins.2025.122673","DOIUrl":null,"url":null,"abstract":"<div><div>Video anomaly detection (VAD) aims to automatically identify anomalous events in surveillance videos that are significantly different from the normal pattern. Most existing methods learn the spatial-temporal distribution of normal features and detect deviations as anomalies. Typically, they employ autoencoders to independently learn appearance and motion features, but this separate learning limits the exploitation of their interrelation in real-world scenarios. To enhance the representation of normal patterns by capturing feature interrelation, we propose a cross-feature fusion and memory-constraint network (CF<sup>2</sup>M-Net) for VAD. Specifically, inspired by the representational ability of cross-attention in multimodal fusion, we design a cross-attention and memory-constraint (CM) module to enrich appearance features with motion information. To prevent overfitting to anomalous events, the memory-constraint module further constrains fused features within the distribution of normal patterns. We design an attention fusion (AF) decoder to predict normal features closer to the normal distribution, enhancing their separability from anomalies. By jointly modeling appearance and motion through feature fusion and memory constraints, CF<sup>2</sup>M-Net provides more discriminative normal representations for anomaly detection. Experimental evaluations on three benchmark datasets show that the CF<sup>2</sup>M-Net performs comparably with leading approaches. Moreover, the detailed evaluations indicate the effectiveness of normal representation based appearance-motion fusion features for VAD.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"723 ","pages":"Article 122673"},"PeriodicalIF":6.8000,"publicationDate":"2025-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CF2M-Net: Cross-feature fusion and memory-constraint network for video anomaly detection\",\"authors\":\"Qiming Ma , Chengyou Wang , Xiao Zhou\",\"doi\":\"10.1016/j.ins.2025.122673\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Video anomaly detection (VAD) aims to automatically identify anomalous events in surveillance videos that are significantly different from the normal pattern. Most existing methods learn the spatial-temporal distribution of normal features and detect deviations as anomalies. Typically, they employ autoencoders to independently learn appearance and motion features, but this separate learning limits the exploitation of their interrelation in real-world scenarios. To enhance the representation of normal patterns by capturing feature interrelation, we propose a cross-feature fusion and memory-constraint network (CF<sup>2</sup>M-Net) for VAD. Specifically, inspired by the representational ability of cross-attention in multimodal fusion, we design a cross-attention and memory-constraint (CM) module to enrich appearance features with motion information. To prevent overfitting to anomalous events, the memory-constraint module further constrains fused features within the distribution of normal patterns. We design an attention fusion (AF) decoder to predict normal features closer to the normal distribution, enhancing their separability from anomalies. By jointly modeling appearance and motion through feature fusion and memory constraints, CF<sup>2</sup>M-Net provides more discriminative normal representations for anomaly detection. Experimental evaluations on three benchmark datasets show that the CF<sup>2</sup>M-Net performs comparably with leading approaches. Moreover, the detailed evaluations indicate the effectiveness of normal representation based appearance-motion fusion features for VAD.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"723 \",\"pages\":\"Article 122673\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525008060\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525008060","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
CF2M-Net: Cross-feature fusion and memory-constraint network for video anomaly detection
Video anomaly detection (VAD) aims to automatically identify anomalous events in surveillance videos that are significantly different from the normal pattern. Most existing methods learn the spatial-temporal distribution of normal features and detect deviations as anomalies. Typically, they employ autoencoders to independently learn appearance and motion features, but this separate learning limits the exploitation of their interrelation in real-world scenarios. To enhance the representation of normal patterns by capturing feature interrelation, we propose a cross-feature fusion and memory-constraint network (CF2M-Net) for VAD. Specifically, inspired by the representational ability of cross-attention in multimodal fusion, we design a cross-attention and memory-constraint (CM) module to enrich appearance features with motion information. To prevent overfitting to anomalous events, the memory-constraint module further constrains fused features within the distribution of normal patterns. We design an attention fusion (AF) decoder to predict normal features closer to the normal distribution, enhancing their separability from anomalies. By jointly modeling appearance and motion through feature fusion and memory constraints, CF2M-Net provides more discriminative normal representations for anomaly detection. Experimental evaluations on three benchmark datasets show that the CF2M-Net performs comparably with leading approaches. Moreover, the detailed evaluations indicate the effectiveness of normal representation based appearance-motion fusion features for VAD.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.