{"title":"Deep Representation for Abnormal Event Detection in Crowded Scenes","authors":"Y. Feng, Yuan Yuan, Xiaoqiang Lu","doi":"10.1145/2964284.2967290","DOIUrl":null,"url":null,"abstract":"Abnormal event detection is extremely important, especially for video surveillance. Nowadays, many detectors have been proposed based on hand-crafted features. However, it remains challenging to effectively distinguish abnormal events from normal ones. This paper proposes a deep representation based algorithm which extracts features in an unsupervised fashion. Specially, appearance, texture, and short-term motion features are automatically learned and fused with stacked denoising autoencoders. Subsequently, long-term temporal clues are modeled with a long short-term memory (LSTM) recurrent network, in order to discover meaningful regularities of video events. The abnormal events are identified as samples which disobey these regularities. Moreover, this paper proposes a spatial anomaly detection strategy via manifold ranking, aiming at excluding false alarms. Experiments and comparisons on real world datasets show that the proposed algorithm outperforms state of the arts for the abnormal event detection problem in crowded scenes.","PeriodicalId":140670,"journal":{"name":"Proceedings of the 24th ACM international conference on Multimedia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 24th ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2964284.2967290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41
Abstract
Abnormal event detection is extremely important, especially for video surveillance. Nowadays, many detectors have been proposed based on hand-crafted features. However, it remains challenging to effectively distinguish abnormal events from normal ones. This paper proposes a deep representation based algorithm which extracts features in an unsupervised fashion. Specially, appearance, texture, and short-term motion features are automatically learned and fused with stacked denoising autoencoders. Subsequently, long-term temporal clues are modeled with a long short-term memory (LSTM) recurrent network, in order to discover meaningful regularities of video events. The abnormal events are identified as samples which disobey these regularities. Moreover, this paper proposes a spatial anomaly detection strategy via manifold ranking, aiming at excluding false alarms. Experiments and comparisons on real world datasets show that the proposed algorithm outperforms state of the arts for the abnormal event detection problem in crowded scenes.