Volker Eiselein, Dan Arp, Michael Pätzold, T. Sikora
{"title":"Real-Time Multi-human Tracking Using a Probability Hypothesis Density Filter and Multiple Detectors","authors":"Volker Eiselein, Dan Arp, Michael Pätzold, T. Sikora","doi":"10.1109/AVSS.2012.59","DOIUrl":"https://doi.org/10.1109/AVSS.2012.59","url":null,"abstract":"The Probability Hypothesis Density (PHD) filter is a multi-object Bayes filter which has recently attracted a lot of interest in the tracking community mainly for its linear complexity and its ability to deal with high clutter especially in radar/sonar scenarios. In the computer vision community however, underlying constraints are different from radar scenarios and have to be taken into account when using the PHD filter. In this article, we propose a new tree-based path extraction algorithm for a Gaussian Mixture PHD filter in Computer Vision applications. We also investigate how an additional benefit can be achieved by using a second human detector and justify an approximation for multiple sensors in low-clutter scenarios.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132499334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Human Action Recognition in Large-Scale Datasets Using Histogram of Spatiotemporal Gradients","authors":"K. Reddy, Naresh P. Cuntoor, A. Perera, A. Hoogs","doi":"10.1109/AVSS.2012.40","DOIUrl":"https://doi.org/10.1109/AVSS.2012.40","url":null,"abstract":"Research in human action recognition has advanced along multiple fronts in recent years to address various types of actions including simple, isolated actions in staged data (e.g., KTH dataset), complex actions (e.g., Hollywood dataset) and naturally occurring actions in surveillance videos (e.g, VIRAT dataset). Several techniques including those based on gradient, flow and interest-points have been developed for their recognition. Most perform very well in standard action recognition datasets, but fail to produce similar results in more complex, large-scale datasets. Here we analyze the reasons for this less than successful generalization by considering a state-of-the-art technique, histogram of oriented gradients in spatiotemporal volumes as an example. This analysis may prove useful in developing robust and effective techniques for action recognition.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128087293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Contextual Statistics of Space-Time Ordered Features for Human Action Recognition","authors":"P. Bilinski, F. Brémond","doi":"10.1109/AVSS.2012.29","DOIUrl":"https://doi.org/10.1109/AVSS.2012.29","url":null,"abstract":"The bag-of-words approach with local spatio-temporal features have become a popular video representation for action recognition. Recent methods have typically focused on capturing global and local statistics of features. However, existing approaches ignore relations between the features, particularly space-time arrangement of features, and thus may not be discriminative enough. Therefore, we propose a novel figure-centric representation which captures both local density of features and statistics of space-time ordered features. Using two benchmark datasets for human action recognition, we demonstrate that our representation enhances the discriminative power of features and improves action recognition performance, achieving 96.16% recognition rate on popular KTH action dataset and 93.33% on challenging ADL dataset.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133888224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recovering People Tracking Errors Using Enhanced Covariance-Based Signatures","authors":"Julien Badie, Sławomir Bąk, S. Şerban, F. Brémond","doi":"10.1109/AVSS.2012.90","DOIUrl":"https://doi.org/10.1109/AVSS.2012.90","url":null,"abstract":"This paper presents a new approach for tracking multiple persons in a single camera. This approach focuses on recovering tracked individuals that have been lost and are detected again, after being miss-detected (e.g. occluded) or after leaving the scene and coming back. In order to correct tracking errors, a multi-cameras re-identification method is adapted, with a real-time constraint. The proposed approach uses a highly discriminative human signature based on covariance matrix, improved using background subtraction, and a people detection confidence. The problem of linking several tracklets belonging to the same individual is also handled as a ranking problem using a learned parameter. The objective is to create clusters of tracklets describing the same individual. The evaluation is performed on PETS2009 dataset showing promising results.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132825130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Person Re-identification by Efficient Impostor-Based Metric Learning","authors":"Martin Hirzer, P. Roth, H. Bischof","doi":"10.1109/AVSS.2012.55","DOIUrl":"https://doi.org/10.1109/AVSS.2012.55","url":null,"abstract":"Recognizing persons over a system of disjunct cameras is a hard task for human operators and even harder for automated systems. In particular, realistic setups show difficulties such as different camera angles or different camera properties. Additionally, also the appearance of exactly the same person can change dramatically due to different views (e.g., frontal/back) of carried objects. In this paper, we mainly address the first problem by learning the transition from one camera to the other. This is realized by learning a Mahalanobis metric using pairs of labeled samples from different cameras. Building on the ideas of Large Margin Nearest Neighbor classification, we obtain a more efficient solution which additionally provides much better generalization properties. To demonstrate these benefits, we run experiments on three different publicly available datasets, showing state-of-the-art or even better results, however, on much lower computational efforts. This is in particular interesting since we use quite simple color and texture features, whereas other approaches build on rather complex image descriptions!","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124721696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Counting People in the Crowd Using a Generic Head Detector","authors":"B. Venkatesh, A. Descamps, C. Carincotte","doi":"10.1109/AVSS.2012.87","DOIUrl":"https://doi.org/10.1109/AVSS.2012.87","url":null,"abstract":"Crowd counting and density estimation is still one of the important task in video surveillance. Usually a regression based method is used to estimate the number of people from a sequence of images. In this paper we investigate to estimate the count of people in a crowded scene. We detect the head region since this is the most visible part of the body in a crowded scene. The head detector is based on state-of-art cascade of boosted integral features. To prune the search region we propose a novel interest point detector based on gradient orientation feature to locate regions similar to the top of head region from gray level images. Two different background subtraction methods are evaluated to further reduce the search region. We evaluate our approach on PETS 2012 and Turin metro station databases. Experiments on these databases show good performance of our method for crowd counting.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124939282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-Time Activity Search of Surveillance Video","authors":"Greg Castañón, Venkatesh Saligrama, André-Louis Caron, Pierre-Marc Jodoin","doi":"10.1109/AVSS.2012.58","DOIUrl":"https://doi.org/10.1109/AVSS.2012.58","url":null,"abstract":"We present a fast and flexible content-based retrieval method for surveillance video. Designing a video search robust to uncertain activity duration, high variability in object shapes and scene content is challenging. We propose a two-step approach to video search. First, local motion features are inserted into an inverted index using locality-sensitive hashing (LSH). Second, we utilize a novel optimization approach based on edit distance to minimize temporal distortion, limited obscuration and imperfect queries. This approach assembles the local features stored in the index into a video segment which matches the query video. Pre-processing of archival video is performed in real-time, and retrieval speed scales as a function of the number of matches rather than video length. We demonstrate the effectiveness of the approach for counting, motion pattern recognition and abandoned object applications using a pair of challenging video datasets.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125033456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boosting Face Recognition in Real-World Surveillance Videos","authors":"Le An, B. Bhanu, Songfan Yang","doi":"10.1109/AVSS.2012.17","DOIUrl":"https://doi.org/10.1109/AVSS.2012.17","url":null,"abstract":"Face recognition becomes a challenging problem in real-world surveillance videos where the low-resolution probe frames exhibit variations in pose, lighting condition, and facial expressions. This is in contrast with the gallery images which are generally frontal view faces acquired under controlled environments. A direct matching of probe images with gallery data often leads to poor recognition accuracy due to the significant discrepancy between the two kinds of data. In addition, the artifacts such as low resolution, blurriness and noise further enlarge this discrepancy. In this paper, we propose a video based face recognition framework using a novel image representation called warped average face (WAF). The WAFs are generated in two stages: in-sequence warping and frontal view warping. The WAFs can be easily used with various feature descriptors or classifiers. As compared to the original probe data, the image quality of the WAFs is significantly better and the appearance difference between the WAFs and the gallery data is suppressed. Given a probe sequence, only a few WAFs need to be generated for the recognition purpose. We test the proposed method on the ChokePoint dataset and our in-house dataset of surveillance quality. Experiments show that with the new image representation, the recognition accuracy can be boosted significantly.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124389583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Color Constancy Using Shadow-Based Illumination Maps for Appearance-Based Person Re-identification","authors":"Eduardo Monari","doi":"10.1109/AVSS.2012.22","DOIUrl":"https://doi.org/10.1109/AVSS.2012.22","url":null,"abstract":"Robust people re-identification is one of the most challenging task, and still an unsolved problem for several applications in video surveillance. A large number of approaches use colors as main features for object description, which are in fact important cues for re-identification. However, colors captured by a camera suffer from unknown and changing global and local illumination conditions in the scene. Thus, color constancy is an essential pre-condition for robust color-based person re-identification. In this paper we introduce a new approach for automated estimation and compensation of local illumination in the scene. The proposed approach allows for handling of multiple light sources in the scene, and to compensate backlight illumination simultaneously. Both are continuous problems with high relevance to practical use of color-based approaches in video surveillance.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132458046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Selective Background Adaptation Based Abnormal Acoustic Event Recognition for Audio Surveillance","authors":"Woohyun Choi, Jinsang Rho, D. Han, Hanseok Ko","doi":"10.1109/AVSS.2012.65","DOIUrl":"https://doi.org/10.1109/AVSS.2012.65","url":null,"abstract":"In this paper, a method for abnormal acoustic event recognition in an audio surveillance system is presented. We propose a recognition scheme based on a hierarchical structure using a feature combination of Mel-Frequency Cepstral Coefficient (MFCC), timbre, and spectral statistics. A selective background adaptation is proposed for robust abnormal acoustic event recognition in real-world situations. For training, we use a database containing 9 abnormal events (scream, glass breaking, and etc.) and 6 background noise types collected under various surveillance situations. Gaussian Mixture Model (GMM) is considered for classifying the representative abnormal acoustic events and for selecting the background noise for adaptation. Effectiveness of the proposed method is demonstrated via representative experimental results.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131560089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}