{"title":"Deep Face Image Retrieval for Cancelable Biometric Authentication","authors":"Young Kyun Jang, N. Cho","doi":"10.1109/AVSS.2019.8909878","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909878","url":null,"abstract":"This paper presents a cancelable biometric system for face authentication by exploiting the convolutional neural network (CNN)-based face image retrieval system. For the cancelable biometrics we must build a template that achieves good performance while maintaining some essential conditions. First the same template should not be used in different applications. Second if the compromise event occurs original biometric data should not be retrieved from the template. Last the template should be easily discarded and recreated. Hence we propose a Deep Table-based Hashing (DTH) framework that encodes CNN-based features into a binary code by utilizing the index of the hashing table. We employ noise embedding and intra-normalization that distorts biometric data which enhances the non-invertibility. For training we propose a new segment-clustering loss and pairwise Hamming loss with two classification losses. The final authentication results are obtained by voting on the outcome of the retrieval system. Experiments conducted on two large scale face image datasets demonstrate that the proposed method works as a proper cancelable biometric system.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133139573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naoki Nishida, Yasutomo Kawanishi, Daisuke Deguchi, I. Ide, H. Murase, Jun Piao
{"title":"Exemplar-Based Pseudo-Viewpoint Rotation for White-Cane User Recognition from a 2D Human Pose Sequence","authors":"Naoki Nishida, Yasutomo Kawanishi, Daisuke Deguchi, I. Ide, H. Murase, Jun Piao","doi":"10.1109/AVSS.2019.8909825","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909825","url":null,"abstract":"In recent years, various facilities are equipped to support visually impaired people, but accidents caused by visual disabilities still occur. In this paper, to support the visually-impaired people in a public space, we aim to classify whether a pedestrian image sequence obtained by a surveillance camera is a white-cane user or not from the temporal transition of a human pose represented as 2D coordinates. However, since the appearance of the 2D pose varies largely depending on the viewpoint of the pose, it is difficult to classify them. So, in this paper, we propose a method to rotate the viewpoint of a pose from various pseudo-viewpoints based on a pair of 2D poses simultaneously observed and classify the sequence by multiple classifiers corresponding to each viewpoint. Viewpoint rotation makes it possible to obtain pseudo-poses seen from various pseudo-viewpoints, extract richer pose features, and recognize white-cane users more accurately. Through an experiment, we confirmed that the proposed method improves the recognition rate by 12% compared to the method not employing viewpoint rotation.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114542074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huaizhong Zhang, Mark Liptrott, Nikolaos Bessis, Jianquan Cheng
{"title":"Real-Time Traffic Analysis using Deep Learning Techniques and UAV based Video","authors":"Huaizhong Zhang, Mark Liptrott, Nikolaos Bessis, Jianquan Cheng","doi":"10.1109/AVSS.2019.8909879","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909879","url":null,"abstract":"In urban environments there are daily issues of traffic congestion which city authorities need to address. Realtime analysis of traffic flow information is crucial for efficiently managing urban traffic. This paper aims to conduct traffic analysis using UAV-based videos and deep learning techniques. The road traffic video is collected by using a position-fixed UAV. The most recent deep learning methods are applied to identify the moving objects in videos. The relevant mobility metrics are calculated to conduct traffic analysis and measure the consequences of traffic congestion. The proposed approach is validated with the manual analysis results and the visualization results. The traffic analysis process is real-time in terms of the pre-trained model used.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115788429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SSSNet: Small-Scale-Aware Siamese Network for Gastric Cancer Detection","authors":"Chih-Chung Hsu, Hsin-Ti Ma, Jun-Yi Lee","doi":"10.1109/AVSS.2019.8909849","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909849","url":null,"abstract":"In recent years, deep neural networks have become the most powerful supervised learning method. Several advanced neural networks, such as AlexNet, ZFNet, Inception, ResNet, and DenseNet, have achieved excellent performance on image recognition tasks. However, deep neural networks rely heavily on huge training sets to obtain good performance. Many applications, such as medical image analysis, do not allow for such large training sets, and it is difficult to train such networks on small-scale training sets. Magnifying narrow band imaging (M-NBI) is widely used to assist doctors in diagnosing gastric cancer, but relatively few of these images are available, compared with the number of general images. In this paper, we propose to use a Siamese network architecture to learn discriminative feature representations based on pairs of images. Then, we use a micro neural network to recognize these features and classify the input images. Our experimental results show that the proposed network can effectively learn discriminative features from a limited number of training images, and also that it can successfully recognize gastric cancer in M-NBI images.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116257294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatio-Temporal Semantic Segmentation for Drone Detection","authors":"Céline Craye, Salem Ardjoune","doi":"10.1109/AVSS.2019.8909854","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909854","url":null,"abstract":"The democratization of drones over the past decade has opened wide cracks in airspace security. Research in drone detection and neutralization for critical infrastructures is a very active area with a number of open issues, such as robust detection of drones based on opto-electronic imaging. Indeed, drones at a certain distance only represent a few pixel points on an image, even on a high resolution camera, and can be easily mistaken for birds or any other flying objects in the airspace. In this context, we propose a spatio-temporal semantic segmentation approach based on convolutional neural networks. We handle the problem of detecting very small targets by using a U-Net architecture to identify areas of interest within the larger image. Then, we use a classification network, ResNet, to determine whether those areas contain a drone or not. To further help the localization and classification process, we provide spatiotemporal input patches to our networks. Drones are mostly moving targets, and birds do not follow the same kinds of trajectories; therefore, this additional feature significantly increases overall performance. This work was carried out in the context of the 2019 Drone-vs-Bird detection Challenge. The evaluation is conducted on the provided dataset under several configurations.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124110719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fityanul Akhyar, Chih-Yang Lin, K. Muchtar, Tung-Ying Wu, Hui-Fuang Ng
{"title":"High Efficient Single-stage Steel Surface Defect Detection","authors":"Fityanul Akhyar, Chih-Yang Lin, K. Muchtar, Tung-Ying Wu, Hui-Fuang Ng","doi":"10.1109/AVSS.2019.8909834","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909834","url":null,"abstract":"To date, deep learning has been widely introduced in many fields, including object detection, medical imaging, and automation. One important application that uses deep learning based object detection is detecting defects by simply evaluating the image of an object. Such systems must be accurate, robust and efficient. Single-stage and two-stage object detection are two main approaches used in defect detection systems. A revised version of the popular object detection method called single shot multi-box detector (SSD) and the residual network (ResNet) offer a two-stage method to automatically detect defects with higher precision but has shown room for improvement with regard to speed performance. Therefore, in this paper, we propose a fully automatic pipeline for detecting defects, especially on steel surfaces. A novel transformation of the two-stage defect detection process into a more efficient single-stage detection process was introduced by utilizing a state-of-the-art method called RetinaNet. In addition, we leverage a feature pyramid network (FPN) and focal loss optimization to solve the small object detection problem and to deal with imbalanced background-foreground samples issue, respectively. Experimental results show that the proposed single-stage pipeline can achieve high accuracy and faster speed in steel surface defect detection.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124589094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sultan Daud Khan, H. Ullah, M. Ullah, N. Conci, F. A. Cheikh, Azeddine Beghdadi
{"title":"Person Head Detection Based Deep Model for People Counting in Sports Videos","authors":"Sultan Daud Khan, H. Ullah, M. Ullah, N. Conci, F. A. Cheikh, Azeddine Beghdadi","doi":"10.1109/AVSS.2019.8909898","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909898","url":null,"abstract":"People counting in sports venues is emerging as a new domain in the field of video surveillance. People counting in these venues faces many key challenges, such as severe occlusions, few pixels per head, and significant variations in person's head sizes due to wide sport areas. We propose a deep model based method, which works as a head detector and takes into consideration the scale variations of heads in videos. Our method is based on the notion that head is the most visible part in the sports venues where large number of people are gathered. To cope with the problem of different scales, we generate scale aware head proposals based on scale map. Scale aware proposals are then fed to the Convolutional Neural Network (CNN) and it provides a response matrix containing the presence probabilities of people observed across scene scales. We then use non-maximal suppression to get the accurate head positions. For the performance evaluation, we carry out extensive experiments on two standard datasets and compare the results with state-of-the-art (SoA) methods. The results in terms of Average Precision (AvP), Average Recall (AvR), and Average F1-Score (AvF-Score) show that our method is better than SoA methods.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127814491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Interactive Framework for Cross-modal Attribute-based Person Retrieval","authors":"Andreas Specker, Arne Schumann, J. Beyerer","doi":"10.1109/AVSS.2019.8909832","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909832","url":null,"abstract":"Person re-identification systems generally rely on a query person image to find additional occurrences of this person across a camera network. In many real-world situations, however, no such query image is available and witness testimony is the only clue upon which to base a search. Cross-modal re-identification based on attribute queries can help in such cases but currently yields a low matching accuracy which is often not sufficient for practical applications. In this work we propose an interactive feedback-driven framework, which successfully bridges the modality gap and achieves a significant increase in accuracy by 47% in mean average precision (mAP) compared to the fully automatic cross-modal state-of-the-art. We further propose a cluster-based feedback method as part of the framework, which outperforms naïve user feedback by more than 9% mAP. Our results set a new state-of-the-art for fully automatic and feedback-driven cross-modal attribute-based re-identification on two public datasets.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114925189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Franchi, Emanuel Aldea, Séverine Dubuisson, I. Bloch
{"title":"Crowd Behavior Characterization for Scene Tracking","authors":"G. Franchi, Emanuel Aldea, Séverine Dubuisson, I. Bloch","doi":"10.1109/AVSS.2019.8909893","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909893","url":null,"abstract":"In this work, we perform an in-depth analysis of the specific difficulties a crowded scene dataset raises for tracking algorithms. Starting from the standard characteristics depicting the crowd and their limitations, we introduce six entropy measures related to the motion patterns and to the appearance variability of the individuals forming the crowd, and one appearance measure based on Principal Component Analysis. The proposed measures are discussed on synthetic configurations and on multiple real datasets. These criteria are able to characterize the crowd behavior at a more detailed level and may be helpful for evaluating the tracking difficulty of different datasets. The results are in agreement with the perceived difficulty of the scenes.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116900902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D Gait Recognition Based on a CNN-LSTM Network with the Fusion of SkeGEI and DA Features","authors":"Yu Liu, Xinghao Jiang, Tanfeng Sun, Ke Xu","doi":"10.1109/AVSS.2019.8909881","DOIUrl":"https://doi.org/10.1109/AVSS.2019.8909881","url":null,"abstract":"Gait recognition is a promising technology in biometrics in video surveillance applications for its characteristics of non-contact and uniqueness. With the popularization of the Kinect sensor, human gait can be recognized based on the 3D skeletal information. For exploiting raw depth data captured by Kinect device effectively, a novel gait recognition approach based on Skeleton Gait Energy Image (SkeGEI) and Relative Distance and Angle (DA) features fusion is proposed. They are fused in backward to complement each other for gait recognition. In order to maintain as much gait information as possible, a CNN-LSTM network is designed to extract the temporal-spatial deep feature information from SkeGEI and DA features. The experiments evaluated on three datasets show that our approach performs superior to most gait recognition approaches with multi-directional and abnormal patterns.","PeriodicalId":243194,"journal":{"name":"2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134643724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}