{"title":"Gender recognition from face images with trainable COSFIRE filters","authors":"G. Azzopardi, Antonio Greco, M. Vento","doi":"10.1109/AVSS.2016.7738068","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738068","url":null,"abstract":"Gender recognition from face images is an important application in the fields of security, retail advertising and marketing. We propose a novel descriptor based on COSFIRE filters for gender recognition. A COSFIRE filter is trainable, in that its selectivity is determined in an automatic configuration process that analyses a given prototype pattern of interest. We demonstrate the effectiveness of the proposed approach on a new dataset called GENDER-FERET with 474 training and 472 test samples and achieve an accuracy rate of 93.7%. It also outperforms an approach that relies on handcrafted features and an ensemble of classifiers. Furthermore, we perform another experiment by using the images of the Labeled Faces in the Wild (LFW) dataset to train our classifier and the test images of the GENDER-FERET dataset for evaluation. This experiment demonstrates the generalization ability of the proposed approach and it also outperforms two commercial libraries, namely Face++ and Luxand.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114329541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alessia Saggese, N. Strisciuglio, M. Vento, N. Petkov
{"title":"Time-frequency analysis for audio event detection in real scenarios","authors":"Alessia Saggese, N. Strisciuglio, M. Vento, N. Petkov","doi":"10.1109/AVSS.2016.7738082","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738082","url":null,"abstract":"We propose a sound analysis system for the detection of audio events in surveillance applications. The method that we propose combines short- and long-time analysis in order to increase the reliability of the detection. The basic idea is that a sound is composed of small, atomic audio units and some of them are distinctive of a particular class of sounds. Similarly to the words in a text, we count the occurrence of audio units for the construction of a feature vector that describes a given time interval. A classifier is then used to learn which audio units are distinctive for the different classes of sound. We compare the performance of different sets of short-time features by carrying out experiments on the MIVIA audio event data set. We study the performance and the stability of the proposed system when it is employed in live scenarios, so as to characterize its expected behavior when used in real applications.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"Suppl 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128456006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving surface normals based action recognition in depth images","authors":"X. Nguyen, T. Nguyen, F. Charpillet","doi":"10.1109/AVSS.2016.7738053","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738053","url":null,"abstract":"In this paper, we propose a new local descriptor for action recognition in depth images. Our proposed descriptor jointly encodes the shape and motion cues using surface normals in 4D space of depth, time, spatial coordinates and higher-order partial derivatives of depth values along spatial coordinates. In a traditional Bag-of-words (BoW) approach, local descriptors extracted from a depth sequence are encoded to form a global representation of the sequence. In our approach, local descriptors are encoded using Sparse Coding (SC) and Fisher Vector (FV), which have been recently proven effective for action recognition. Action recognition is then simply performed using a linear SVM classifier. Our proposed action descriptor is evaluated on two public benchmark datasets, MSRAction3D and MSRGesture3D. The experimental result shows the effectiveness of the proposed method on both the datasets.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124431692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. D. Coco, P. Carcagnì, Marco Leo, P. Mazzeo, P. Spagnolo
{"title":"Assessment of deep learning for gender classification on traditional datasets","authors":"M. D. Coco, P. Carcagnì, Marco Leo, P. Mazzeo, P. Spagnolo","doi":"10.1109/AVSS.2016.7738061","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738061","url":null,"abstract":"Deep Learning has becoming a popular and effective way to address a large set of issues. In particular, in computer vision, it has been exploited to get satisfying recognition performance in unconstrained conditions. However, this wild race towards even better performance in extreme conditions has overshadowed an important step i.e. the assessment of the impact of this new methodology on traditional issues on which for years the researchers had worked. This is particularly true for biometrics applications where the evaluation of deep learning has been made directly on newest large and more challencing datasets. This lead to a pure data driven evaluation that makes difficult to analyze the relationships between network configurations, learning process and experienced outcomes. This paper tries to partially fill this gap by applying a DNN for gender recognition on the MORPH dataset and evaluating how a lower cardinality of examples used for learning can bias the recognition performance.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124281453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Training a convolutional neural network for multi-class object detection using solely virtual world data","authors":"Erik Bochinski, Volker Eiselein, T. Sikora","doi":"10.1109/AVSS.2016.7738056","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738056","url":null,"abstract":"Convolutional neural networks are a popular choice for current object detection and classification systems. Their performance improves constantly but for effective training, large, hand-labeled datasets are required. We address the problem of obtaining customized, yet large enough datasets for CNN training by synthesizing them in a virtual world, thus eliminating the need for tedious human interaction for ground truth creation. We developed a CNN-based multi-class detection system that was trained solely on virtual world data and achieves competitive results compared to state-of-the-art detection systems.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116907042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Crowd semantic segmentation based on spatial-temporal dynamics","authors":"Jijia Li, Hua Yang, Shuang Wu","doi":"10.1109/AVSS.2016.7738032","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738032","url":null,"abstract":"Crowd semantic segmentation is supposed to not only accurately segment the crowd into groups but also describe them by semantic properties. We define a group as a set of members sharing common spatial-temporal dynamics, i.e., motion consistency and distribution homogeneity. This paper proposes a novel crowd semantic segmentation method, termed as joint spatial-temporal semantic segmentation, which leverages the temporal motion characteristics and spatial distribution information of crowd. We first conduct temporal motion grouping and spatial distribution grouping according to motion consistency and distribution homogeneity respectively. Then, a a joint semantic segmentation algorithm is employed to combine the motion and distribution groups into semantic groups. States of these groups are described in terms of motion pattern and density level. Experiments show that our proposed method is effective to obtain favorable segmentation with semantic descriptions.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"13 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120894528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prioritized target tracking with active collaborative cameras","authors":"Yiming Wang, A. Cavallaro","doi":"10.1109/AVSS.2016.7738066","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738066","url":null,"abstract":"Mobile cameras on robotic platforms can support fixed multi-camera installations to improve coverage and target localization accuracy. We propose a novel collaborative framework for prioritized target tracking that complement static cameras with mobile cameras, which track targets on demand. Upon receiving a request from static cameras, a mobile camera selects (or switches to) a target to track using a local selection criterion that accounts for target priority, view quality and energy consumption. Mobile cameras use a receding horizon scheme to minimize tracking uncertainty as well as energy consumption when planning their path. We validate the proposed framework in simulated realistic scenarios and show that it improves tracking accuracy and target observation time with reduced energy consumption compared to a framework with only static cameras and compared to a state-of-the-art motion strategy.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"s1-6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127197341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tracking-based detection of driving distraction from vehicular interior video","authors":"Tashrif Billah, S. Rahman","doi":"10.1109/AVSS.2016.7738077","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738077","url":null,"abstract":"Distraction during driving is a growing concern for global road safety. Different activities impertinent to driving hinder the concentration of driver on road and often cause substantial damage to life and property. For making driving safe, an algorithm is proposed in this paper that is capable of detecting distraction during driving. The proposed algorithm tracks key body parts of the driver in video captured by a front camera. Euclidean distances between the tracking trajectories of body parts are used as representative features that characterize the state of distraction or attention of a driver. The well-known K-nearest neighbor classifier is applied for detecting distraction from the features extracted from body parts. The proposed method is compared with existing methods implementing tracking-based human action identification to corroborate its improved performance.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"34 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114038452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HoG based real-time multi-target tracking in Bayesian framework","authors":"M. Ullah, F. A. Cheikh, Ali Shariq Imran","doi":"10.1109/AVSS.2016.7738080","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738080","url":null,"abstract":"Multi-target tracking is one of the most challenging tasks in computer vision. Several complex techniques have been proposed in the literature to tackle the problem. The main idea of such approaches is to find an optimal set of trajectories within a temporal window. The performance of such approaches are fairly good but their computational complexity is too high making them unpractical. In this paper, we propose a novel tracking-by-detection approach in a Bayesian filtering framework. The appearance of a target is modeled through HoG descriptor and the critical problem of target association is solved through combinatorial optimization. It is a simple yet very efficient approach and experimental results show that it achieves state-of-the-art performance in real time.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124730005","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A visual SLAM-based approach for calibration of distributed camera networks","authors":"T. Pollok, Eduardo Monari","doi":"10.1109/AVSS.2016.7738081","DOIUrl":"https://doi.org/10.1109/AVSS.2016.7738081","url":null,"abstract":"This paper presents a concept which tackles the pose estimation problem (extrinsic calibration) for distributed, non-overlapping multi-camera networks. The basic idea is to use a visual SLAM technique in order to reconstruct the scene from a video which includes areas visible by each camera of the network. The reconstruction consists of a sparse, but highly accurate point cloud, representing a joint 3D reference coordinate system. Additionally, a set of 3D-registered keyframes (images) are used for high resolution representation of the scene which also include a mapping between a set of 2D pixels to 3D points of the point cloud. The pose estimation of each surveillance camera is performed individually by assigning 2D-2D correspondences between pixels of the surveillance camera and pixels of similar keyframes that map to a 3D point. This allows to implicitly obtain a set of 2D-3D correspondences between pixels in the surveillance camera and their corresponding 3D points in a joint reference coordinate system. Thus the global camera pose can be estimated using robust methods for solving the perspective-n-point problem.","PeriodicalId":438290,"journal":{"name":"2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129907340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}