{"title":"Discriminative Multiview Learning for Robust Palmprint Feature Representation and Recognition","authors":"Shuyi Li;Jianhang Zhou;Bob Zhang;Lifang Wu;Meng Jian","doi":"10.1109/TBIOM.2024.3401574","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3401574","url":null,"abstract":"Binary-based feature representation methods have received increasing attention in palmprint recognition due to their high efficiency and great robustness to illumination variation. However, most of them are hand-designed descriptors that generally require much prior knowledge in their design. On the other hand, conventional single-view palmprint recognition approaches have difficulty in expressing the features of each sample strongly, especially low-quality palmprint images. To solve these problems, in this paper, we propose a novel discriminative multiview learning method, named Row-sparsity Binary Feature Learning-based Multiview (RsBFL_Mv) representation, for palmprint recognition. Specifically, given the training multiview data, RsBFL_Mv jointly learns multiple projection matrices that transform the informative multiview features into discriminative binary codes. Afterwards, the learned binary codes of each view are converted to the real-value map. Following this, we calculate the histograms of multiview feature maps and concatenate them for matching. For RsBFL_Mv, we enforce three criteria: 1) the quantization error between the projected real-valued features and the binary features of each view is minimized, at the same time, the projection error is minimized; 2) the salient label information for each view is utilized to minimize the distance of the within-class samples and simultaneously maximize the distance of the between-class samples; 3) the \u0000<inline-formula> <tex-math>$l_{2,1}$ </tex-math></inline-formula>\u0000 norm is used to make the learned projection matrices to extract more representative features. Extensive experimental results on two publicly accessible palmprint datasets demonstrated the effectiveness of the proposed method in recognition accuracy and computational efficiency. Furthermore, additional experiments are conducted on two commonly used finger vein datasets that verified the powerful generalization capability of the proposed method.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"304-313"},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Survey of EEG-Based Driver State and Behavior Detection for Intelligent Vehicles","authors":"Jiawei Ju;Hongqi Li","doi":"10.1109/TBIOM.2024.3400866","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3400866","url":null,"abstract":"The driver’s state and behavior are crucial for the driving process, which affect the driving safety directly or indirectly. Electroencephalography (EEG) signals have the advantage of predictability and have been widely used to detect and predict the users’ states and behaviors. Accordingly, the EEG-based driver state and behavior detection, which can be integrated into the intelligent vehicles, is becoming the hot research topic to develop an intelligent assisted driving system (IADS). In this paper, we systematically reviewed the EEG-based driver state and behavior detection for intelligent vehicles. First, we concluded the most popular methods for EEG-based IADS, including the algorithms of the signal acquisition, preprocessing, signal enhancement, feature calculation, feature selection, classification, and post-processing. Then, we surveyed the research on separate EEG-based driver state detection and the driver behavior detection, respectively. The research on EEG-based combinations of driver state and behavior detection was further reviewed. For the review of these studies of driver state, behavior, and combined state and behavior, we not only defined the related fundamental information and overviewed the research on single EEG-based brain-computer interface (BCI) applications, but also further explored the relevant research progress on the EEG-based hybrid BCIs. Finally, we thoroughly discussed the current challenges, possible solutions, and future research directions.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"420-434"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fake It Till You Recognize It: Quality Assessment for Human Action Generative Models","authors":"Bruno Degardin;Vasco Lopes;Hugo Proença","doi":"10.1109/TBIOM.2024.3375453","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3375453","url":null,"abstract":"Skeleton-based generative modelling is an important research topic to mitigate the heavy annotation process. In this work, we explore the impact of synthetic data on skeleton-based action recognition alongside its evaluation methods for more precise quality extraction. We propose a novel iterative weakly-supervised learning generative strategy for synthesising high-quality human actions. We combine conditional generative models with Bayesian classifiers to select the highest-quality samples. As an essential factor, we designed a discriminator network that, together with a Bayesian classifier relies on the most realistic instances to augment the amount of data available for the next iteration without requiring standard cumbersome annotation processes. Additionally, as a key contribution to assessing the quality of samples, we propose a novel measure based on human kinematics instead of employing commonly used evaluation methods, which are heavily based on images. The rationale is to capture the intrinsic characteristics of human skeleton dynamics, thereby complementing model comparison and alleviating the need to manually select the best samples. Experiments were carried out over four benchmarks of two well-known datasets (NTU RGB+D and NTU-120 RGB+D), where both our framework and model assessment can notably enhance skeleton-based action recognition and generation models by synthesising high-quality and realistic human actions.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 2","pages":"261-271"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"eDifFIQA: Towards Efficient Face Image Quality Assessment Based on Denoising Diffusion Probabilistic Models","authors":"Žiga Babnik;Peter Peer;Vitomir Štruc","doi":"10.1109/TBIOM.2024.3376236","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3376236","url":null,"abstract":"State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"458-474"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rediscovering Minutiae Matching Through One Shot Learning’s Siamese Framework in Poor Quality Footprint Images","authors":"Riti Kushwaha;Gaurav Singal;Neeta Nain","doi":"10.1109/TBIOM.2024.3399402","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3399402","url":null,"abstract":"Footprint biometrics is one of the emerging techniques, which can be utilized in different security systems. A human footprint has unique traits which is sufficient to recognize any person. Existing work evaluates the shape features and texture features but very few authors have explored minutiae features, hence this article provides a study based on minutiae features. The current State-of-the-art methods utilize machine learning techniques, which suffer from low accuracy in case of poor-quality of data. These machine learning techniques provide approx 97% accuracy while using good quality images but are not able to perform well when we use poor quality images. We have proposed a minutiae matching system based on deep learning techniques which is able to handle samples with adequate noise. We have used Convolution Neural Network for the feature extraction. It uses two different ridge flow estimation methods, i.e., ConvNet-based and dictionary-based. Furthermore, fingerprint-matching metrics are used for footprint feature evaluation. We initially employed a contrastive-based loss function, resulting in an accuracy of 56%. Subsequently, we adapted our approach by implementing a distance-based loss function, which improved the accuracy to 66%.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"398-408"},"PeriodicalIF":0.0,"publicationDate":"2024-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SFace2: Synthetic-Based Face Recognition With w-Space Identity-Driven Sampling","authors":"Fadi Boutros;Marco Huber;Anh Thi Luu;Patrick Siebke;Naser Damer","doi":"10.1109/TBIOM.2024.3371502","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3371502","url":null,"abstract":"The use of synthetic data for training neural networks has recently received increased attention, especially in the area of face recognition. This was mainly motivated by the increase of privacy, ethical, and legal concerns of using privacy-sensitive authentic data to train face recognition models. Many authentic datasets such as MS-Celeb-1M or VGGFace2 that have been widely used to train state-of-the-art deep face recognition models are retracted and officially no longer maintained or provided by official sources as they often have been collected without explicit consent. Toward this end, we first propose a synthetic face generation approach, SFace which utilizes a class-conditional generative adversarial network to generate class-labeled synthetic face images. To evaluate the privacy aspect of using such synthetic data in face recognition development, we provide an extensive evaluation of the identity relation between the generated synthetic dataset and the original authentic dataset used to train the generative model. The investigation proved that the associated identity of the authentic dataset to the one with the same class label in the synthetic dataset is hardly possible, strengthening the possibility for privacy-aware face recognition training. We then propose three different learning strategies to train the face recognition model on our privacy-friendly dataset, SFace, and report the results on five authentic benchmarks, demonstrating its high potential. Noticing the relatively low (in comparison to authentic data) identity discrimination in SFace, we started by analysing the w-space of the class-conditional generator, finding identity information that is highly correlated to that in the embedding space. Based on this finding, we proposed an approach that performs the sampling in the w-space driven to generate data with higher identity discrimination, the SFace2. Our experiments showed the disentanglement of the latent w-space and the benefit of training face recognition models on the more identity-discriminated synthetic dataset SFace2.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"290-303"},"PeriodicalIF":0.0,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SSPRA: A Robust Approach to Continuous Authentication Amidst Real-World Adversarial Challenges","authors":"Frank Chen;Jingyu Xin;Vir V. Phoha","doi":"10.1109/TBIOM.2024.3369590","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3369590","url":null,"abstract":"In real-world deployment, continuous authentication for mobile devices faces challenges such as intermittent data streams, variable data quality, and varying modality reliability. To address these challenges, we introduce a framework based on Markov process, named State-Space Perturbation-Resistant Approach (SSPRA). SSPRA integrates a two-level multi-modality fusion mechanism and dual state transition machines (STMs). This two-level fusion integrates probabilities from available modalities at each inspection (vertical-level) and evolves state probabilities over time (horizontal-level), thereby enhancing decision accuracy. It effectively manages modality disruptions and adjusts to variations in modality reliability. The dual STMs trigger appropriate responses upon detecting suspicious data, managing data fluctuations and extending operational duration, thus improving user experience. In our simulations, covering standard operations and adversarial scenarios like zero to non-zero-effort (ZE/NZE) attacks, modality disconnections, and data fluctuations, SSPRA consistently outperformed all baselines, including Sim’s HMM and three state-of-the-art deep-learning models. Notably, in adversarial attack scenarios, SSPRA achieved substantial reductions in False Alarm Rate (FAR) - 36.31%, 36.58%, and 8.26% - and improvements in True Alarm Rate (TAR) - 33.15%, 33.75%, and 5.1% compared to the DeepSense, Siamese-structured network, and UMSNet models, respectively. Furthermore, it outperformed all baselines in modality disconnection and fluctuation scenarios, underscoring SSPRA’s potential in addressing real-world challenges in mobile device authentication.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 2","pages":"245-260"},"PeriodicalIF":0.0,"publicationDate":"2024-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"From Modalities to Styles: Rethinking the Domain Gap in Heterogeneous Face Recognition","authors":"Anjith George;Sébastien Marcel","doi":"10.1109/TBIOM.2024.3365350","DOIUrl":"10.1109/TBIOM.2024.3365350","url":null,"abstract":"Heterogeneous Face Recognition (HFR) focuses on matching faces from different domains, for instance, thermal to visible images, making Face Recognition (FR) systems more versatile for challenging scenarios. However, the domain gap between these domains and the limited large-scale datasets in the target HFR modalities make it challenging to develop robust HFR models from scratch. In our work, we view different modalities as distinct styles and propose a method to modulate feature maps of the target modality to address the domain gap. We present a new Conditional Adaptive Instance Modulation (CAIM) module that seamlessly fits into existing FR networks, turning them into HFR-ready systems. The CAIM block modulates intermediate feature maps, efficiently adapting to the style of the source modality and bridging the domain gap. Our method enables end-to-end training using a small set of paired samples. We extensively evaluate the proposed approach on various challenging HFR benchmarks, showing that it outperforms state-of-the-art methods. The source code and protocols for reproducing the findings will be made publicly available.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"475-485"},"PeriodicalIF":0.0,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140673423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GSCL: Generative Self-Supervised Contrastive Learning for Vein-Based Biometric Verification","authors":"Wei-Feng Ou;Lai-Man Po;Xiu-Feng Huang;Wing-Yin Yu;Yu-Zhi Zhao","doi":"10.1109/TBIOM.2024.3364021","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3364021","url":null,"abstract":"Vein-based biometric technology offers secure identity authentication due to the concealed nature of blood vessels. Despite the promising performance of deep learning-based biometric vein recognition, the scarcity of vein data hinders the discriminative power of deep features, thus affecting overall performance. To tackle this problem, this paper presents a generative self-supervised contrastive learning (GSCL) scheme, designed from a data-centric viewpoint to fully mine the potential prior knowledge from limited vein data for improving feature representations. GSCL first utilizes a style-based generator to model vein image distribution and then generate numerous vein image samples. These generated vein images are then leveraged to pretrain the feature extraction network via self-supervised contrastive learning. Subsequently, the network undergoes further fine-tuning using the original training data in a supervised manner. This systematic combination of generative and discriminative modeling allows the network to comprehensively excavate the semantic prior knowledge inherent in vein data, ultimately improving the quality of feature representations. In addition, we investigate a multi-template enrollment method for improving practical verification accuracy. Extensive experiments conducted on public finger vein and palm vein databases, as well as a newly collected finger vein video database, demonstrate the effectiveness of GSCL in improving representation quality.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 2","pages":"230-244"},"PeriodicalIF":0.0,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hanqing Zheng;Yuxuan Shi;Hefei Ling;Zongyi Li;Runsheng Wang;Zhongyang Li;Ping Li
{"title":"Cascade Transformer Reasoning Embedded by Uncertainty for Occluded Person Re-Identification","authors":"Hanqing Zheng;Yuxuan Shi;Hefei Ling;Zongyi Li;Runsheng Wang;Zhongyang Li;Ping Li","doi":"10.1109/TBIOM.2024.3361677","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3361677","url":null,"abstract":"Occluded person re-identification is a challenging task due to various noise introduced by occlusion. Previous methods utilize body detectors to exploit more clues which are overdependent on accuracy of detection results. In this paper, we propose a model named Cascade Transformer Reasoning Embedded by Uncertainty Network (CTU) which does not require external information. Self-attention of the transformer models long-range dependency to capture difference between pixels, which helps the model focus on discriminative information of human bodies. However, noise such as occlusion will bring a high level of uncertainty to feature learning and makes self-attention learn undesirable dependency. We invent a novel structure named Uncertainty Embedded Transformer (UT) Layer to involve uncertainty in computing attention weights of self-attention. Introducing uncertainty mechanism helps the network better evaluate the dependency between pixels and focus more on human bodies. Additionally, our proposed transformer layer generates an attention mask through Cascade Attention Module (CA) to guide the next layer to focus more on key areas of the feature map, decomposing feature learning into cascade stages. Extensive experiments over challenging datasets Occluded-DukeMTMC, P-DukeMTMC, etc., verify the effectiveness of our method.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 2","pages":"219-229"},"PeriodicalIF":0.0,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}