{"title":"Real-Time Pedestrian Tracking with Bacterial Foraging Optimization","authors":"H. T. Nguyen, B. Bhanu","doi":"10.1109/AVSS.2012.60","DOIUrl":"https://doi.org/10.1109/AVSS.2012.60","url":null,"abstract":"In this paper, we present swarm intelligence algorithms for pedestrian tracking. In particular, we present a modified Bacterial Foraging Optimization (BFO) algorithm and show that it outperforms PSO in a number of important metrics for pedestrian tracking. In our experiments, we show that BFO's search strategy is inherently more efficient than PSO under a range of variables with regard to the number of fitness evaluations which need to be performed when tracking. We also compare the proposed BFO approach with other commonly-used trackers and present experimental results on the CAVIAR dataset as well as on the difficult PETS2010 S2.L3 crowd video.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121889092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Background Subtraction for Real-Time Video Analytics Based on Multi-hypothesis Mixture-of-Gaussians","authors":"Mahfuzul Haque, M. Murshed","doi":"10.1109/AVSS.2012.15","DOIUrl":"https://doi.org/10.1109/AVSS.2012.15","url":null,"abstract":"Robust background subtraction (BS) is essential for high quality foreground detection in most video analytics systems. Recent BS techniques achieve superior detection quality mostly by exploiting the complementary strengths of multiple background models or processing stages. Consequently, these techniques fail to meet the operational requirements of real-time video analytics due to high computational overhead where BS is just the primary processing task. In this paper, we propose a new BS technique, named multi-hypothesis mixture-of-Gaussians (MH-MOG), suitable for real-time video analytics. The essential idea is to maintain a single background model based on perception-aware mixture-of-Gaussians and then, generating multiple detection hypotheses with different processing bases. Finally, only during the detection stage, the complementary strengths of the hypotheses are exploited to achieve superior detection quality without significant computational overhead. Comprehensive experimental evaluation validates the efficacy of MH-MOG.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117168539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Evaluation of Multi-camera Visual Tracking","authors":"L. Marcenaro, Pietro Morerio, C. Regazzoni","doi":"10.1109/AVSS.2012.86","DOIUrl":"https://doi.org/10.1109/AVSS.2012.86","url":null,"abstract":"Main drawbacks in single-camera multi-target visual tracking can be partially removed by increasing the amount of information gathered on the scene, i.e. by adding cameras. By adopting such a multi-camera approach, multiple sensors cooperate for overall scene understanding. However, new issues arise such as data association and data fusion. This work addresses the issue of evaluating the performance of a multi-camera tracking algorithm based on Rao-Blackwellized Monte Carlo data association (RBMCDA) on real data. For this purpose, a new metric based on three performance indexes is developed.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"10 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132071366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Donatello Conte, P. Foggia, G. Percannella, Alessia Saggese, M. Vento
{"title":"An Ensemble of Rejecting Classifiers for Anomaly Detection of Audio Events","authors":"Donatello Conte, P. Foggia, G. Percannella, Alessia Saggese, M. Vento","doi":"10.1109/AVSS.2012.9","DOIUrl":"https://doi.org/10.1109/AVSS.2012.9","url":null,"abstract":"Audio analytic systems are receiving an increasing interest in the scientific community, not only as stand alone systems for the automatic detection of abnormal events by the interpretation of the audio track, but also in conjunction with video analytics tools for enforcing the evidence of anomaly detection. In this paper we present an automatic recognizer of a set of abnormal audio events that works by extracting suitable features from the signals obtained by microphones installed into a surveilled area, and by classifying them using two classifiers that operate at different time resolutions. An original aspect of the proposed system is the estimation of the reliability of each response of the individual classifiers. In this way, each classifier is able to reject the samples having an overall reliability below a threshold. This approach allows our system to combine only reliable decisions, so increasing the overall performance of the method. The system has been tested on a large dataset of samples acquired from real world scenarios, the audio classes of interests are represented by gunshot, scream and glass breaking in addition to the background sounds. The preliminary results obtained encourage further research in this direction.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115114191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight and Robust Shadow Removal for Foreground Detection","authors":"A. Gawde, Kedar Joshi, Senem Velipasalar","doi":"10.1109/AVSS.2012.44","DOIUrl":"https://doi.org/10.1109/AVSS.2012.44","url":null,"abstract":"Background subtraction is a commonly used method to detect moving objects from videos captured by static cameras. However, shadows and reflections significantly affect the output of background subtraction algorithms, and distort the shape of the objects obtained as a result. Thus, shadow detection and removal is a crucial post-processing step to perform accurate object tracking required by different applications. We present a lightweight method to detect and remove shadows as well as reflection effects in indoor and outdoor environments by using spatial and spectral features. This method incorporates an adaptive way to set thresholds to avoid preset numbers. We present a comparison of the outputs we obtained with those of several other methods. The experimental results demonstrate the success of the proposed algorithm.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127319834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minh-Son Dao, Riccardo Mattivi, F. D. Natale, Keita Masui, N. Babaguchi
{"title":"Abandoned Object's Owner Detection: A Case Study of Hybrid Mobile-Fixed Video Surveillance System","authors":"Minh-Son Dao, Riccardo Mattivi, F. D. Natale, Keita Masui, N. Babaguchi","doi":"10.1109/AVSS.2012.4","DOIUrl":"https://doi.org/10.1109/AVSS.2012.4","url":null,"abstract":"In this paper, a new framework of hybrid mobile-fixed video surveillance system (HMFVSS) is introduced. The purpose of this framework is to overcome common problems of existing mobile or fixed video surveillance systems: (1) moral harassment: due to unfriendly or unnaturally installed mobile sensors, and (2) blind areas: due to narrow-scope moving of fixed cameras. A case study of abandoned object's owner alert system (AOOAS) is also presented to emphasize the framework's advantages. IP cameras and \"Spyglass\" (i.e. a mobile camera embedded on glasses) are used as fixed and mobile sensors, respectively. There are three main tasks are inherited, developed, and integrated: (1) image registration for automatically locating abandoned object, (2) common histogram based abandoned object's owner detection, and (3) faces recognition. The experimental results with careful evaluation and comparison with others shows that the proposed framework moves a step ahead in video surveillance system.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129129595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Pers, Vildana Sulic Kenk, Rok Mandeljc, M. Kristan, S. Kovacic
{"title":"Dana36: A Multi-camera Image Dataset for Object Identification in Surveillance Scenarios","authors":"J. Pers, Vildana Sulic Kenk, Rok Mandeljc, M. Kristan, S. Kovacic","doi":"10.1109/AVSS.2012.33","DOIUrl":"https://doi.org/10.1109/AVSS.2012.33","url":null,"abstract":"We present a novel dataset for evaluation of object matching and recognition methods in surveillance scenarios. Dataset consists of more than 23,000 images, depicting 15 persons and nine vehicles. A ground truth data - the identity of each person or vehicle - is provided, along with the coordinates of the bounding box in the full camera image. The dataset was acquired from 36 stationary camera views using a variety of surveillance cameras with resolutions ranging from standard VGA to three megapixel. 27 cameras observed the persons and vehicles in an outdoor environment, while the remaining nine observed the same persons indoors. The activity of persons was planned in advance, they drive the cars to the parking lot, exit the cars and walk around the building, through the main entrance, and up the stairs, towards the first floor of the building. The intended use of the dataset is performance evaluation of computer vision methods that aim to (re)identify people and objects from many different viewpoints in different environments and under variable conditions. Due to variety of camera locations, vantage points and resolutions, the dataset provides means to adjust the difficulty of the identification task in a controlled and documented manner. An interface for easy use of dataset within Matlab is provided as well, and the data is complemented by baseline results using a basic color histogram-based descriptor. While the cropped images of persons and vehicles represent the primary data in our dataset, we also provide full-frame images and a set of tracklets for each object as a courtesy to the dataset users.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121853692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative Sparse Approximation for Multiple-Shot Across-Camera Person Re-identification","authors":"Yang Wu, M. Minoh, M. Mukunoki, Wei Li, S. Lao","doi":"10.1109/AVSS.2012.21","DOIUrl":"https://doi.org/10.1109/AVSS.2012.21","url":null,"abstract":"In this paper we propose a simple and effective solution to the important and challenging problem of across-camera person re-identification. We focus on the common case in video surveillance where multiple images or video frames are available for each person. Instead of exploring new features, the proposed approach aims at making a better use of such images/frames. It builds a collaborative representation over all the gallery images (of known person individuals) to best approximate the query images (containing an unknown person) via affine combinations. The approximation is measured by the nearest point distance between the two affine hulls constructed by the query images and gallery images, respectively. By enforcing the sparsity of the samples used for approximating the two nearest points, the relative importance of the gallery images belonging to different persons has the ability to reveal the identity of the querying person. Extensive experiments on public benchmark datasets demonstrate that the proposed approach greatly outperforms the state-of-the-art methods.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128475686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Appearance-Based Re-identification of Humans in Low-Resolution Videos Using Means of Covariance Descriptors","authors":"J. Metzler","doi":"10.1109/AVSS.2012.12","DOIUrl":"https://doi.org/10.1109/AVSS.2012.12","url":null,"abstract":"The objective of human re-identification is to recognize a specific individual on different locations and to determine whether an individual has already appeared. This is especially in multi-camera networks with non-overlapping fields of view of interest. However, this is still an unsolved computer vision task due to several challenges, e.g. significant changes of appearance of humans as well as different illumination, camera parameters etc. In addition, for instance, in surveillance scenarios only low-resolution videos are usually available, so that biometric approaches may not be applied. This paper presents a whole-body appearance-based human re-identification approach for low-resolution videos. We propose a novel appearance model computed from several images of an individual. The model is based on means of covariance descriptors determined by spectral clustering techniques. The proposed approach is tested on a multi-camera data set of a typical surveillance scenario and compared to a color histogram based method.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131699425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semi-supervised Gait Recognition Based on Self-Training","authors":"Yanan Li, Yilong Yin, Lili Liu, Shaohua Pang, Qiuhong Yu","doi":"10.1109/AVSS.2012.66","DOIUrl":"https://doi.org/10.1109/AVSS.2012.66","url":null,"abstract":"Traditional gait recognition researches focus on supervised learning methods that use only a limited number of labeled sequences to train, which will definitely restrict the recognition ability of the gait recognition system. Meanwhile, training with more typical gait sequences can improve the generalization ability of gait recognition system and eventually achieve better recognition accuracy. However, it is difficult, expensive, time consuming and boring to capture enough gait sequences comparing with capturing other biometric traits such as fingerprint, face and iris during the enrolment stage. To address the problem, a semi-supervised gait recognition algorithm based on self-training is proposed to optimize the performance of gait recognition system with both a few labeled sequences and a large amount of unlabeled sequences. Nearest Neighbor (NN) classifier and K-Nearest Neighbor (KNN) classifier are carried out to recognize the different subjects. Experimental results show that the proposed algorithm has an encouraging recognition performance even with only one labeled sequence each class.","PeriodicalId":275325,"journal":{"name":"2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2012-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132966011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}