IEEE transactions on biometrics, behavior, and identity science最新文献_第8页

SPADNet: Structure Prior-Aware Dynamic Network for Face Super-Resolution SPADNet：用于人脸超分辨率的结构先验感知动态网络

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-28 DOI: 10.1109/TBIOM.2024.3382870

Chenyang Wang;Junjun Jiang;Kui Jiang;Xianming Liu

{"title":"SPADNet: Structure Prior-Aware Dynamic Network for Face Super-Resolution","authors":"Chenyang Wang;Junjun Jiang;Kui Jiang;Xianming Liu","doi":"10.1109/TBIOM.2024.3382870","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3382870","url":null,"abstract":"The recent emergence of deep learning neural networks has propelled advancements in the field of face super-resolution. While these deep learning-based methods have shown significant performance improvements, they depend overwhelmingly on fixed, spatially shared kernels within standard convolutional layers. This leads to a neglect of the diverse facial structures and regions, consequently struggling to reconstruct high-fidelity face images. As a highly structured object, the structural features of a face are crucial for representing and reconstructing face images. To this end, we introduce a structure prior-aware dynamic network (SPADNet) that leverages facial structure priors as a foundation to generate structure-aware dynamic kernels for the distinctive super-resolution of various face images. In view of that spatially shared kernels are not well-suited for specific-regions representation, a local structure-adaptive convolution (LSAC) is devised to characterize the local relation of facial features. It is more effective for precise texture representation. Meanwhile, a global structure-aware convolution (GSAC) is elaborated to capture the global facial contours to guarantee the structure consistency. These strategies form a unified face reconstruction framework, which reconciles the distinct representation of diverse face images and individual structure fidelity. Extensive experiments confirm the superiority of our proposed SPADNet over state-of-the-art methods. The source codes of the proposed method will be available at \u0000<uri>https://github.com/wcy-cs/SPADNet</uri>\u0000.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"326-340"},"PeriodicalIF":0.0,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring Fusion Techniques and Explainable AI on Adapt-FuseNet: Context-Adaptive Fusion of Face and Gait for Person Identification 在 Adapt-FuseNet 上探索融合技术和可解释人工智能：上下文自适应融合人脸和步态以进行人员识别

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-27 DOI: 10.1109/TBIOM.2024.3405081

Thejaswin S;Ashwin Prakash;Athira Nambiar;Alexandre Bernadino

{"title":"Exploring Fusion Techniques and Explainable AI on Adapt-FuseNet: Context-Adaptive Fusion of Face and Gait for Person Identification","authors":"Thejaswin S;Ashwin Prakash;Athira Nambiar;Alexandre Bernadino","doi":"10.1109/TBIOM.2024.3405081","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3405081","url":null,"abstract":"Biometrics such as human gait and face play a significant role in vision-based surveillance applications. However, multimodal fusion of biometric features is a challenging task in non-controlled environments due to varying reliability of the features from different modalities in changing contexts, such as viewpoints, illuminations, occlusion, background clutter, and clothing. For instance, in person identification in the wild, facial and gait features play a complementary role, as, in principle, face provides more discriminatory features than gait if the person is frontal to the camera, while gait features are more discriminative in lateral views. Classical fusion techniques typically address this problem by explicitly computing in which context the data is obtained (e.g., frontal or lateral) and designing custom data fusion strategies for each context. However, this requires an initial enumeration of all the possible contexts and the design of context “detectors”, which bring their own challenges. Hence, how to effectively utilize both facial and gait information in arbitrary conditions is still an open problem. In this paper we present a context-adaptive multi-biometric fusion strategy that does not require the prior determination of context features; instead, the context is implicitly encoded in the fusion process by a set of attentional weights that encode the relevance of the different modalities for each particular data sample. The key contributions of the paper are threefold. First, we propose a novel framework for the dynamic fusion of multiple biometrics modalities leveraging attention techniques, denoted ‘Adapt-FuseNet’. Second, we perform an extensive evaluation of the proposed method in comparison to various other fusion techniques such as Bilinear Pooling, Parallel Co-attention, Keyless Attention, Multi-modal Factorized High-order Pooling, and Multimodal Tucker Fusion. Third, an Explainable Artificial Intelligence-based interpretation tool is used to analyse how the attention mechanism of ‘Adapt-FuseNet’ is capturing context implicitly and making the best weighting of the different modalities for the task at hand. This enables the interpretability of results in a more human-compliant way, hence boosting our confidence of the operation of AI systems in the wild. Extensive experiments are carried out on two public gait datasets (CASIA-A and CASIA-B), showing that ‘Adapt-FuseNet’ significantly outperforms the state-of-the-art.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"515-527"},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Balancing Accuracy and Error Rates in Fingerprint Verification Systems Under Presentation Attacks With Sequential Fusion 利用序列融合技术平衡呈现攻击下指纹验证系统的准确率和错误率

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-27 DOI: 10.1109/TBIOM.2024.3405554

Marco Micheletto;Gian Luca Marcialis

{"title":"Balancing Accuracy and Error Rates in Fingerprint Verification Systems Under Presentation Attacks With Sequential Fusion","authors":"Marco Micheletto;Gian Luca Marcialis","doi":"10.1109/TBIOM.2024.3405554","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3405554","url":null,"abstract":"The assessment of the fingerprint PADs embedded into a comparison system represents an emerging topic in biometric recognition. Providing models and methods for this aim helps scientists, technologists, and companies to simulate multiple scenarios and have a realistic view of the process’s consequences on the recognition system. The most recent models aimed at deriving the overall system performance, especially in the sequential assessment of the fingerprint liveness and comparison pointed out a significant decrease in Genuine Acceptance Rate (GAR). In particular, our previous studies showed that PAD contributes predominantly to this drop, regardless of the comparison system used. This paper’s goal is to establish a systematic approach for the “trade-off” computation between the gain in Impostor Attack Presentation Accept Rate (IAPAR) and the loss in GAR mentioned above. We propose a formal “trade-off” definition to measure the balance between tackling presentation attacks and the performance drop on genuine users. Experimental simulations and theoretical expectations confirm that an appropriate “trade-off” definition allows a complete view of the sequential embedding potentials.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"409-419"},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10539301","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attention Label Learning to Enhance Interactive Vein Transformer for Palm-Vein Recognition 通过注意力标签学习增强掌静脉识别的交互式静脉变换器

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-26 DOI: 10.1109/TBIOM.2024.3381654

Huafeng Qin;Changqing Gong;Yantao Li;Mounim A. El-Yacoubi;Xinbo Gao;Jun Wang

{"title":"Attention Label Learning to Enhance Interactive Vein Transformer for Palm-Vein Recognition","authors":"Huafeng Qin;Changqing Gong;Yantao Li;Mounim A. El-Yacoubi;Xinbo Gao;Jun Wang","doi":"10.1109/TBIOM.2024.3381654","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3381654","url":null,"abstract":"In recent years, vein biometrics has gained significant attention due to its high security and privacy features. While deep neural networks have become the predominant classification approaches for their ability to automatically extract discriminative vein features, they still face certain drawbacks: 1) Existing transformer-based vein classifiers struggle to capture interactive information among different attention modules, limiting their feature representation capacity; 2) Current label enhancement methods, although effective in learning label distributions for classifier training, fail to model long-range relations between classes. To address these issues, we present ALE-IVT, an Attention Label Enhancement-based Interactive Vein Transformer for palm-vein recognition. First, to extract vein features, we propose an interactive vein transformer (IVT) consisting of three branches, namely spatial attention, channel attention, and convolutional module. In order to enhance performance, we integrate an interactive module that facilitates the sharing of discriminative features among the three branches. Second, we explore an attention-based label enhancement (ALE) approach to learn label distribution. ALE employs a self-attention mechanism to capture correlation between classes, enabling the inference of label distribution for classifier training. As self-attention can model long-range dependencies between classes, the resulting label distribution provides enhanced supervised information for training the vein classifier. Finally, we combine ALE with IVT to create ALE-IVT, trained in an end-to-end manner to boost the recognition accuracy of the IVT classifier. Our experiments on three public datasets demonstrate that our IVT model surpasses existing state-of-the-art vein classifiers. In addition, ALE outperforms current label enhancement approaches in term of recognition accuracy.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"341-351"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mobile Contactless Fingerprint Presentation Attack Detection: Generalizability and Explainability 移动非接触式指纹演示攻击检测：通用性和可解释性

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-21 DOI: 10.1109/TBIOM.2024.3403770

Jannis Priesnitz;Roberto Casula;Jascha Kolberg;Meiling Fang;Akhila Madhu;Christian Rathgeb;Gian Luca Marcialis;Naser Damer;Christoph Busch

{"title":"Mobile Contactless Fingerprint Presentation Attack Detection: Generalizability and Explainability","authors":"Jannis Priesnitz;Roberto Casula;Jascha Kolberg;Meiling Fang;Akhila Madhu;Christian Rathgeb;Gian Luca Marcialis;Naser Damer;Christoph Busch","doi":"10.1109/TBIOM.2024.3403770","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3403770","url":null,"abstract":"Contactless fingerprint recognition is an emerging biometric technology that has several advantages over contact-based schemes, such as improved user acceptance and fewer hygienic concerns. Like for most other biometrics, Presentation Attack Detection (PAD) is crucial to preserving the trustworthiness of contactless fingerprint recognition methods. For many contactless biometric characteristics, Convolutional Neural Networks (CNNs) represent the state-of-the-art of PAD algorithms. For CNNs, the ability to accurately classify samples that are not included in the training is of particular interest, since these generalization capabilities indicate robustness in real-world scenarios. In this work, we focus on the generalizability and explainability aspects of CNN-based contactless fingerprint PAD methods. Based on previously obtained findings, we selected four CNN-based methods for contactless fingerprint PAD: two PAD methods designed for other biometric characteristics, an algorithm for contact-based fingerprint PAD and a general-purpose ResNet18. For our evaluation, we use four databases and partition them using Leave-One-Out (LOO) protocols. Furthermore, the generalization capability to a newly captured database is tested. Moreover, we explore t-SNE plots as a means of explainability to interpret our results in more detail. The low D-EERs obtained from the LOO experiments (below 0.1% D-EER for every LOO group) indicate that the selected algorithms are well-suited for the particular application. However, with an D-EER of 4.14%, the generalization experiment still has room for improvement.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"561-574"},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10536028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MCLFIQ: Mobile Contactless Fingerprint Image Quality MCLFIQ：移动式非接触指纹图像质量

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-18 DOI: 10.1109/TBIOM.2024.3377686

Jannis Priesnitz;Axel Weißenfeld;Laurenz Ruzicka;Christian Rathgeb;Bernhard Strobl;Ralph Lessmann;Christoph Busch

{"title":"MCLFIQ: Mobile Contactless Fingerprint Image Quality","authors":"Jannis Priesnitz;Axel Weißenfeld;Laurenz Ruzicka;Christian Rathgeb;Bernhard Strobl;Ralph Lessmann;Christoph Busch","doi":"10.1109/TBIOM.2024.3377686","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3377686","url":null,"abstract":"We propose MCLFIQ: Mobile Contactless Fingerprint Image Quality, the first quality assessment algorithm for mobile contactless fingerprint samples. To this end, we re-trained the NIST Fingerprint Image Quality (NFIQ) 2 method, which was originally designed for contact-based fingerprints, with a synthetic contactless fingerprint database. We evaluate the predictive performance of the resulting MCLFIQ model in terms of Error-vs.-Discard Characteristic (EDC) curves on three real-world contactless fingerprint databases using three recognition algorithms. In experiments, the MCLFIQ method is compared against the original NFIQ 2 method, a sharpness-based quality assessment algorithm developed for contactless fingerprint images and the general purpose image quality assessment method BRISQUE. Furthermore, benchmarks on four contact-based fingerprint datasets are also conducted. Obtained results show that the fine-tuning of NFIQ 2 on synthetic contactless fingerprints is a viable alternative to training on real databases. Moreover, the evaluation shows that our MCLFIQ method works more accurately and is more robust compared to all baseline methods on contactless fingerprints. We suggest considering the proposed MCLFIQ method as a starting point for the development of a new standard algorithm for contactless fingerprint quality assessment.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 2","pages":"272-287"},"PeriodicalIF":0.0,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10473152","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discriminative Multiview Learning for Robust Palmprint Feature Representation and Recognition 用于鲁棒性掌纹特征表示和识别的判别式多视图学习

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-15 DOI: 10.1109/TBIOM.2024.3401574

Shuyi Li;Jianhang Zhou;Bob Zhang;Lifang Wu;Meng Jian

{"title":"Discriminative Multiview Learning for Robust Palmprint Feature Representation and Recognition","authors":"Shuyi Li;Jianhang Zhou;Bob Zhang;Lifang Wu;Meng Jian","doi":"10.1109/TBIOM.2024.3401574","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3401574","url":null,"abstract":"Binary-based feature representation methods have received increasing attention in palmprint recognition due to their high efficiency and great robustness to illumination variation. However, most of them are hand-designed descriptors that generally require much prior knowledge in their design. On the other hand, conventional single-view palmprint recognition approaches have difficulty in expressing the features of each sample strongly, especially low-quality palmprint images. To solve these problems, in this paper, we propose a novel discriminative multiview learning method, named Row-sparsity Binary Feature Learning-based Multiview (RsBFL_Mv) representation, for palmprint recognition. Specifically, given the training multiview data, RsBFL_Mv jointly learns multiple projection matrices that transform the informative multiview features into discriminative binary codes. Afterwards, the learned binary codes of each view are converted to the real-value map. Following this, we calculate the histograms of multiview feature maps and concatenate them for matching. For RsBFL_Mv, we enforce three criteria: 1) the quantization error between the projected real-valued features and the binary features of each view is minimized, at the same time, the projection error is minimized; 2) the salient label information for each view is utilized to minimize the distance of the within-class samples and simultaneously maximize the distance of the between-class samples; 3) the \u0000<inline-formula> <tex-math>$l_{2,1}$ </tex-math></inline-formula>\u0000 norm is used to make the learned projection matrices to extract more representative features. Extensive experimental results on two publicly accessible palmprint datasets demonstrated the effectiveness of the proposed method in recognition accuracy and computational efficiency. Furthermore, additional experiments are conducted on two commonly used finger vein datasets that verified the powerful generalization capability of the proposed method.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"304-313"},"PeriodicalIF":0.0,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Survey of EEG-Based Driver State and Behavior Detection for Intelligent Vehicles 基于脑电图的智能汽车驾驶员状态和行为检测概览

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-14 DOI: 10.1109/TBIOM.2024.3400866

Jiawei Ju;Hongqi Li

{"title":"A Survey of EEG-Based Driver State and Behavior Detection for Intelligent Vehicles","authors":"Jiawei Ju;Hongqi Li","doi":"10.1109/TBIOM.2024.3400866","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3400866","url":null,"abstract":"The driver’s state and behavior are crucial for the driving process, which affect the driving safety directly or indirectly. Electroencephalography (EEG) signals have the advantage of predictability and have been widely used to detect and predict the users’ states and behaviors. Accordingly, the EEG-based driver state and behavior detection, which can be integrated into the intelligent vehicles, is becoming the hot research topic to develop an intelligent assisted driving system (IADS). In this paper, we systematically reviewed the EEG-based driver state and behavior detection for intelligent vehicles. First, we concluded the most popular methods for EEG-based IADS, including the algorithms of the signal acquisition, preprocessing, signal enhancement, feature calculation, feature selection, classification, and post-processing. Then, we surveyed the research on separate EEG-based driver state detection and the driver behavior detection, respectively. The research on EEG-based combinations of driver state and behavior detection was further reviewed. For the review of these studies of driver state, behavior, and combined state and behavior, we not only defined the related fundamental information and overviewed the research on single EEG-based brain-computer interface (BCI) applications, but also further explored the relevant research progress on the EEG-based hybrid BCIs. Finally, we thoroughly discussed the current challenges, possible solutions, and future research directions.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"420-434"},"PeriodicalIF":0.0,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fake It Till You Recognize It: Quality Assessment for Human Action Generative Models 假作真时真亦假人类行动生成模型的质量评估

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-12 DOI: 10.1109/TBIOM.2024.3375453

Bruno Degardin;Vasco Lopes;Hugo Proença

{"title":"Fake It Till You Recognize It: Quality Assessment for Human Action Generative Models","authors":"Bruno Degardin;Vasco Lopes;Hugo Proença","doi":"10.1109/TBIOM.2024.3375453","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3375453","url":null,"abstract":"Skeleton-based generative modelling is an important research topic to mitigate the heavy annotation process. In this work, we explore the impact of synthetic data on skeleton-based action recognition alongside its evaluation methods for more precise quality extraction. We propose a novel iterative weakly-supervised learning generative strategy for synthesising high-quality human actions. We combine conditional generative models with Bayesian classifiers to select the highest-quality samples. As an essential factor, we designed a discriminator network that, together with a Bayesian classifier relies on the most realistic instances to augment the amount of data available for the next iteration without requiring standard cumbersome annotation processes. Additionally, as a key contribution to assessing the quality of samples, we propose a novel measure based on human kinematics instead of employing commonly used evaluation methods, which are heavily based on images. The rationale is to capture the intrinsic characteristics of human skeleton dynamics, thereby complementing model comparison and alleviating the need to manually select the best samples. Experiments were carried out over four benchmarks of two well-known datasets (NTU RGB+D and NTU-120 RGB+D), where both our framework and model assessment can notably enhance skeleton-based action recognition and generation models by synthesising high-quality and realistic human actions.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 2","pages":"261-271"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

eDifFIQA: Towards Efficient Face Image Quality Assessment Based on Denoising Diffusion Probabilistic Models eDifFIQA：基于去噪扩散概率模型的高效人脸图像质量评估

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2024-03-12 DOI: 10.1109/TBIOM.2024.3376236

Žiga Babnik;Peter Peer;Vitomir Štruc

{"title":"eDifFIQA: Towards Efficient Face Image Quality Assessment Based on Denoising Diffusion Probabilistic Models","authors":"Žiga Babnik;Peter Peer;Vitomir Štruc","doi":"10.1109/TBIOM.2024.3376236","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3376236","url":null,"abstract":"State-of-the-art Face Recognition (FR) models perform well in constrained scenarios, but frequently fail in difficult real-world scenarios, when no quality guarantees can be made for face samples. For this reason, Face Image Quality Assessment (FIQA) techniques are often used by FR systems, to provide quality estimates of captured face samples. The quality estimate provided by FIQA techniques can be used by the FR system to reject samples of low-quality, in turn improving the performance of the system and reducing the number of critical false-match errors. However, despite steady improvements, ensuring a good trade-off between the performance and computational complexity of FIQA methods across diverse face samples remains challenging. In this paper, we present DifFIQA, a powerful unsupervised approach for quality assessment based on the popular denoising diffusion probabilistic models (DDPMs) and the extended (eDifFIQA) approach. The main idea of the base DifFIQA approach is to utilize the forward and backward processes of DDPMs to perturb facial images and quantify the impact of these perturbations on the corresponding image embeddings for quality prediction. Because of the iterative nature of DDPMs the base DifFIQA approach is extremely computationally expensive. Using eDifFIQA we are able to improve on both the performance and computational complexity of the base DifFIQA approach, by employing label optimized knowledge distillation. In this process, quality information inferred by DifFIQA is distilled into a quality-regression model. During the distillation process we use an additional source of quality information hidden in the relative position of the embedding to further improve the predictive capabilities of the underlying regression model. By choosing different feature extraction backbone models as the basis for the quality-regression eDifFIQA model, we are able to control the trade-off between the predictive capabilities and computational complexity of the final model. We evaluate three eDifFIQA variants of varying sizes in comprehensive experiments on 7 diverse datasets containing static-images and a separate video-based dataset, with 4 target CNN-based FR models and 2 target Transformer-based FR models and against 10 state-of-the-art FIQA techniques, as well as against the initial DifFIQA baseline and a simple regression-based predictor DifFIQA(R), distilled from DifFIQA without any additional optimization. The results show that the proposed label optimized knowledge distillation improves on the performance and computationally complexity of the base DifFIQA approach, and is able to achieve state-of-the-art performance in several distinct experimental scenarios. Furthermore, we also show that the distilled model can be used directly for face recognition and leads to highly competitive results.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"458-474"},"PeriodicalIF":0.0,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0