{"title":"Learning a Fixed-Length Fingerprint Representation.","authors":"Joshua J Engelsma, Kai Cao, Anil K Jain","doi":"10.1109/TPAMI.2019.2961349","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2961349","url":null,"abstract":"<p><p>We present DeepPrint, a deep network, which learns to extract fixed-length fingerprint representations of only 200 bytes. DeepPrint incorporates fingerprint domain knowledge, including alignment and minutiae detection, into the deep network architecture to maximize the discriminative power of its representation. The compact, DeepPrint representation has several advantages over the prevailing variable length minutiae representation which (i) requires computationally expensive graph matching techniques, (ii) is difficult to secure using strong encryption schemes (e.g., homomorphic encryption), and (iii) has low discriminative power in poor quality fingerprints where minutiae extraction is unreliable. We benchmark DeepPrint against two top performing COTS SDKs (Verifinger and Innovatrics) from the NIST and FVC evaluations. Coupled with a re-ranking scheme, the DeepPrint rank-1 search accuracy on the NIST SD4 dataset against a gallery of 1.1 million fingerprints is comparable to the top COTS matcher, but it is significantly faster (DeepPrint: 98.80% in 0.3 seconds vs. COTS A: 98.85% in 27 seconds). To the best of our knowledge, the DeepPrint representation is the most compact and discriminative fixed-length fingerprint representation reported in the academic literature.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1981-1997"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2961349","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37486548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yannick Hold-Geoffroy, Paulo Gotardo, Jean-Francois Lalonde
{"title":"Single Day Outdoor Photometric Stereo.","authors":"Yannick Hold-Geoffroy, Paulo Gotardo, Jean-Francois Lalonde","doi":"10.1109/TPAMI.2019.2962693","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2962693","url":null,"abstract":"<p><p>Photometric Stereo (PS) under outdoor illumination remains a challenging, ill-posed problem due to insufficient variability in illumination. Months-long capture sessions are typically used in this setup, with little success on shorter, single-day time intervals. In this paper, we investigate the solution of outdoor PS over a single day, under different weather conditions. First, we investigate the relationship between weather and surface reconstructability in order to understand when natural lighting allows existing PS algorithms to work. Our analysis reveals that partially cloudy days improve the conditioning of the outdoor PS problem while sunny days do not allow the unambiguous recovery of surface normals from photometric cues alone. We demonstrate that calibrated PS algorithms can thus be employed to reconstruct Lambertian surfaces accurately under partially cloudy days. Second, we solve the ambiguity arising in clear days by combining photometric cues with prior knowledge on material properties, local surface geometry and the natural variations in outdoor lighting through a CNN-based, weakly-calibrated PS technique. Given a sequence of outdoor images captured during a single sunny day, our method robustly estimates the scene surface normals with unprecedented quality for the considered scenario. Our approach does not require precise geolocation and significantly outperforms several state-of-the-art methods on images with real lighting, showing that our CNN can combine efficiently learned priors and photometric cues available during a single sunny day.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"2062-2074"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2962693","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37510063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Teena Hassan, Dominik Seus, Johannes Wollenberg, Katharina Weitz, Miriam Kunz, Stefan Lautenbacher, Jens-Uwe Garbas, Ute Schmid
{"title":"Automatic Detection of Pain from Facial Expressions: A Survey.","authors":"Teena Hassan, Dominik Seus, Johannes Wollenberg, Katharina Weitz, Miriam Kunz, Stefan Lautenbacher, Jens-Uwe Garbas, Ute Schmid","doi":"10.1109/TPAMI.2019.2958341","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2958341","url":null,"abstract":"<p><p>Pain sensation is essential for survival, since it draws attention to physical threat to the body. Pain assessment is usually done through self-reports. However, self-assessment of pain is not available in the case of noncommunicative patients, and therefore, observer reports should be relied upon. Observer reports of pain could be prone to errors due to subjective biases of observers. Moreover, continuous monitoring by humans is impractical. Therefore, automatic pain detection technology could be deployed to assist human caregivers and complement their service, thereby improving the quality of pain management, especially for noncommunicative patients. Facial expressions are a reliable indicator of pain, and are used in all observer-based pain assessment tools. Following the advancements in automatic facial expression analysis, computer vision researchers have tried to use this technology for developing approaches for automatically detecting pain from facial expressions. This paper surveys the literature published in this field over the past decade, categorizes it, and identifies future research directions. The survey covers the pain datasets used in the reviewed literature, the learning tasks targeted by the approaches, the features extracted from images and image sequences to represent pain-related information, and finally, the machine learning methods used.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1815-1831"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2958341","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37447718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Memory- and Accuracy-Aware Gaussian Parameter-Based Stereo Matching Using Confidence Measure.","authors":"Yeongmin Lee, Chong-Min Kyung","doi":"10.1109/TPAMI.2019.2959613","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2959613","url":null,"abstract":"<p><p>Accurate stereo matching requires a large amount of memory at a high bandwidth, which restricts its use in resource-limited systems such as mobile devices. This problem is compounded by the recent trend of applications requiring significantly high pixel resolution and disparity levels. To alleviate this, we present a memory-efficient and robust stereo matching algorithm. For cost aggregation, we employ the semiglobal parametric approach, which significantly reduces the memory bandwidth by representing the costs of all disparities as a Gaussian mixture model. All costs on multiple paths in an image are aggregated by updating the Gaussian parameters. The aggregation is performed during the scanning in the forward and backward directions. To reduce the amount of memory for the intermediate results during the forward scan, we suggest to store only the Gaussian parameters which contribute significantly to the final disparity selection. We also propose a method to enhance the overall procedure through a learning-based confidence measure. The random forest framework is used to train various features which are extracted from the cost and intensity profile. The experimental results on KITTI dataset show that the proposed method reduces the memory requirement to less than 3 percent of that of semiglobal matching (SGM) while providing a robust depth map compared to those of state-of-the-art SGM-based algorithms.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1845-1858"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2959613","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37484221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Henri Rebecq, Rene Ranftl, Vladlen Koltun, Davide Scaramuzza
{"title":"High Speed and High Dynamic Range Video with an Event Camera.","authors":"Henri Rebecq, Rene Ranftl, Vladlen Koltun, Davide Scaramuzza","doi":"10.1109/TPAMI.2019.2963386","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2963386","url":null,"abstract":"<p><p>Event cameras are novel sensors that report brightness changes in the form of a stream of asynchronous \"events\" instead of intensity frames. They offer significant advantages with respect to conventional cameras: high temporal resolution, high dynamic range, and no motion blur. While the stream of events encodes in principle the complete visual signal, the reconstruction of an intensity image from a stream of events is an ill-posed problem in practice. Existing reconstruction approaches are based on hand-crafted priors and strong assumptions about the imaging process as well as the statistics of natural images. In this work we propose to learn to reconstruct intensity images from event streams directly from data instead of relying on any hand-crafted priors. We propose a novel recurrent network to reconstruct videos from a stream of events, and train it on a large amount of simulated event data. During training we propose to use a perceptual loss to encourage reconstructions to follow natural image statistics. We further extend our approach to synthesize color images from color event streams. Our quantitative experiments show that our network surpasses state-of-the-art reconstruction methods by a large margin in terms of image quality ( ), while comfortably running in real-time. We show that the network is able to synthesize high framerate videos ( frames per second) of high-speed phenomena (e.g., a bullet hitting an object) and is able to provide high dynamic range reconstructions in challenging lighting conditions. As an additional contribution, we demonstrate the effectiveness of our reconstructions as an intermediate representation for event data. We show that off-the-shelf computer vision algorithms can be applied to our reconstructions for tasks such as object classification and visual-inertial odometry and that this strategy consistently outperforms algorithms that were specifically designed for event data. We release the reconstruction code, a pre-trained model and the datasets to enable further research.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"43 6","pages":"1964-1980"},"PeriodicalIF":23.6,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2963386","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37512254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Geodesic Multi-Class SVM with Stiefel Manifold Embedding.","authors":"Rui Zhang, Xuelong Li, Hongyuan Zhang, Ziheng Jiao","doi":"10.1109/TPAMI.2021.3069498","DOIUrl":"10.1109/TPAMI.2021.3069498","url":null,"abstract":"<p><p>Manifold of geodesic plays an essential role in characterizing the intrinsic data geometry. However, the existing SVM methods have largely neglected the manifold structure. As such, functional degeneration may occur due to the potential polluted training. Even worse, the entire SVM model might collapse in the presence of excessive training contamination. To address these issues, this paper devises a manifold SVM method based on a novel ξ -measure geodesic, whose primary design objective is to extract and preserve the data manifold structure in the presence of training noises. To further cope with overly contaminated training data, we introduce Kullback-Leibler (KL) regularization with steerable sparsity constraint. In this way, each loss weight is adaptively obtained by obeying the prior distribution and sparse activation during model training for robust fitting. Moreover, the optimal scale for Stiefel manifold can be automatically learned to improve the model flexibility. Accordingly, extensive experiments verify and validate the superiority of the proposed method.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2021-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25531112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FakeCatcher: Detection of Synthetic Portrait Videos using Biological Signals.","authors":"Umur Aybars Ciftci, Ilke Demir, Lijun Yin","doi":"10.1109/TPAMI.2020.3009287","DOIUrl":"10.1109/TPAMI.2020.3009287","url":null,"abstract":"<p><p>The recent proliferation of fake portrait videos poses direct threats on society, law, and privacy [1]. Believing the fake video of a politician, distributing fake pornographic content of celebrities, fabricating impersonated fake videos as evidence in courts are just a few real world consequences of deep fakes. We present a novel approach to detect synthetic content in portrait videos, as a preventive solution for the emerging threat of deep fakes. In other words, we introduce a deep fake detector. We observe that detectors blindly utilizing deep learning are not effective in catching fake content, as generative models produce formidably realistic results. Our key assertion follows that biological signals hidden in portrait videos can be used as an implicit descriptor of authenticity, because they are neither spatially nor temporally preserved in fake content. To prove and exploit this assertion, we first engage several signal transformations for the pairwise separation problem, achieving 99.39% accuracy. Second, we utilize those findings to formulate a generalized classifier for fake content, by analyzing proposed signal transformations and corresponding feature sets. Third, we generate novel signal maps and employ a CNN to improve our traditional classifier for detecting synthetic content. Lastly, we release an \"in the wild\" dataset of fake portrait videos that we collected as a part of our evaluation process. We evaluate FakeCatcher on several datasets, resulting with 96%, 94.65%, 91.50%, and 91.07% accuracies, on Face Forensics [2], Face Forensics++ [3], CelebDF [4], and on our new Deep Fakes Dataset respectively. In addition, our approach produces a significantly superior detection rate against baselines, and does not depend on the source, generator, or properties of the fake content. We also analyze signals from various facial regions, under image distortions, with varying segment durations, from different generators, against unseen datasets, and under several dimensionality reduction techniques.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"PP ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2020-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38228143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro
{"title":"Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method.","authors":"J Matias Di Martino, Fernando Suzacq, Mauricio Delbracio, Qiang Qiu, Guillermo Sapiro","doi":"10.1109/TPAMI.2020.2986951","DOIUrl":"10.1109/TPAMI.2020.2986951","url":null,"abstract":"<p><p>Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"42 7","pages":"1582-1593"},"PeriodicalIF":23.6,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7892197/pdf/nihms-1668137.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9150865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Table of Contents","authors":"","doi":"10.1109/tpami.2020.2995283","DOIUrl":"https://doi.org/10.1109/tpami.2020.2995283","url":null,"abstract":"","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":" ","pages":""},"PeriodicalIF":23.6,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/tpami.2020.2995283","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46180837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon
{"title":"Recurrent Temporal Aggregation Framework for Deep Video Inpainting.","authors":"Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon","doi":"10.1109/TPAMI.2019.2958083","DOIUrl":"https://doi.org/10.1109/TPAMI.2019.2958083","url":null,"abstract":"<p><p>Video inpainting aims to fill in spatio-temporal holes in videos with plausible content. Despite tremendous progress on deep learning-based inpainting of a single image, it is still challenging to extend these methods to video domain due to the additional time dimension. In this paper, we propose a recurrent temporal aggregation framework for fast deep video inpainting. In particular, we construct an encoder-decoder model, where the encoder takes multiple reference frames which can provide visible pixels revealed from the scene dynamics. These hints are aggregated and fed into the decoder. We apply a recurrent feedback in an auto-regressive manner to enforce temporal consistency in the video results. We propose two architectural designs based on this framework. Our first model is a blind video decaptioning network (BVDNet) that is designed to automatically remove and inpaint text overlays in videos without any mask information. Our BVDNet wins the first place in the ECCV Chalearn 2018 LAP Inpainting Competition Track 2: Video Decaptioning. Second, we propose a network for more general video inpainting (VINet) to deal with more arbitrary and larger holes. Video results demonstrate the advantage of our framework compared to state-of-the-art methods both qualitatively and quantitatively. The codes are available at https://github.com/mcahny/Deep-Video-Inpainting, and https://github.com/shwoo93/video_decaptioning.</p>","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"42 5","pages":"1038-1052"},"PeriodicalIF":23.6,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TPAMI.2019.2958083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37452544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}