Qingran Miao;Haixia Wang;Yilong Zhang;Rui Yan;Yipeng Liu
{"title":"Sweat Gland Enhancement Method for Fingertip OCT Images Based on Generative Adversarial Network","authors":"Qingran Miao;Haixia Wang;Yilong Zhang;Rui Yan;Yipeng Liu","doi":"10.1109/TBIOM.2024.3459812","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3459812","url":null,"abstract":"Sweat pores are gaining recognition as a secure, reliable, and identifiable third-level fingerprint feature. Challenges arise in collecting sweat pores when fingers are contaminated, dry, or damaged, leading to unclear or vanished surface sweat pores. Optical Coherence Tomography (OCT) has been applied in the collection of fingertip biometric features. The sweat pores mapped from the subcutaneous sweat glands collected by OCT possess higher security and stability. However, speckle noise in OCT images can blur sweat glands making segmentation and extraction difficult. Traditional denoising methods cause unclear sweat gland contours and structural loss due to smearing and excessive smoothing. Deep learning-based methods have not achieved good results due to the lack of clean images as ground-truth. This paper proposes a sweat gland enhancement method for fingertip OCT images based on Generative Adversarial Network (GAN). It can effectively remove speckle noise while eliminating irrelevant structures and repairing the lost structure of sweat glands, ultimately improving the accuracy of sweat gland segmentation and extraction. To the best knowledge, it is the first time that sweat gland enhancement is investigated and proposed. In this method, a paired dataset generation strategy is proposed, which can extend few manually enhanced ground-truth into a high-quality paired dataset. An improved Pix2Pix for sweat gland enhancement is proposed, with the addition of a perceptual loss to mitigate structural distortions during the image translation process. It’s worth noting that after obtaining the paired dataset, any advanced supervised image-to-image translation network can be adapted into our framework for enhancement. Experiments are carried out to verify the effectiveness of the proposed method.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"550-560"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are Synthetic Datasets Reliable for Benchmarking Generalizable Person Re-Identification?","authors":"Cuicui Kang","doi":"10.1109/TBIOM.2024.3459828","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3459828","url":null,"abstract":"Recent studies show that models trained on synthetic datasets are able to achieve better generalizable person re-identification (GPReID) performance than that trained on public real-world datasets. On the other hand, due to the limitations of real-world person ReID datasets, it would also be important and interesting to use large-scale synthetic datasets as test sets to benchmark person ReID algorithms. Yet this raises a critical question: are synthetic datasets reliable for benchmarking generalizable person re-identification? In the literature there is no evidence showing this. To address this, we design a method called Pairwise Ranking Analysis (PRA) to quantitatively measure the ranking similarity, and a subsequent method called Metric-Independent Statistical Test (MIST) to perform the statistical test of identical distributions. Specifically, we employ Kendall rank correlation coefficients to evaluate pairwise similarity values between algorithm rankings on different datasets. Then, after removing metric dependency via PRA, a non-parametric two-sample Kolmogorov-Smirnov (KS) test is performed for the judgement of whether algorithm ranking correlations between synthetic and real-world datasets and those only between real-world datasets lie in identical distributions. We conduct comprehensive experiments, with twelve representative algorithms, three popular real-world person ReID datasets, and three recently released large-scale synthetic datasets. Through the designed PRA and MIST methods and comprehensive evaluations, we conclude that the recent large-scale synthetic datasets ClonedPerson, UnrealPerson and RandPerson can be reliably used to benchmark GPReID, statistically the same as real-world datasets. Therefore, this study guarantees the usage of synthetic datasets for both source training set and target testing set, with completely no privacy concerns from real-world surveillance data. Besides, the study in this paper might also inspire future designs of synthetic datasets, as the resulting p-values via the proposed MIST method can also be used to assess the reliability of a synthetic dataset for benchmarking algorithms.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"146-155"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Expression Recognition Based on Feature Representation Learning and Clustering-Based Attention Mechanism","authors":"Lianghai Jin;Liyuan Guo","doi":"10.1109/TBIOM.2024.3454975","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3454975","url":null,"abstract":"Facial expression recognition (FER) plays an important role in many computer vision applications. Generally, FER networks are trained based on the annotated labels or the probability distribution that an expression belongs to seven expression categories. However, the quality of annotations is heavily affected by ambiguous and indistinguishable facial expressions caused by compound and mixed emotions. Furthermore, it is difficult to annotate the seven-dimensional labels (probability distributions). To address these problems, this paper proposes a new FER network model. This model represents each type of facial expression as a high dimensional feature vector, based on which the FER network is trained. The high-dimensional feature representation of each facial expression class is learned by a special binary feature representation generator network. We also develop a clustering-based group split attention mechanism, which enhances the emotion-related features effectively. The experimental results on two lab-controlled datasets and four in-the-wild datasets demonstrate the effectiveness of the proposed FER model by showing clear performance improvements over other state-of-the-art FER methods. Codes are available at <uri>https://github.com/Gabrella/GLA-FNet</uri>.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"182-194"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Wang;Yushu Zhang;Zixuan Yang;Xiangli Xiao;Hua Zhang;Zhongyun Hua
{"title":"Seeing is Not Believing: An Identity Hider for Human Vision Privacy Protection","authors":"Tao Wang;Yushu Zhang;Zixuan Yang;Xiangli Xiao;Hua Zhang;Zhongyun Hua","doi":"10.1109/TBIOM.2024.3449849","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3449849","url":null,"abstract":"Massive captured face images are stored in the database for the identification of individuals. However, these images can be observed unintentionally by data examiners, which is not at the will of individuals and may cause privacy violations. Existing protection schemes can maintain identifiability but slightly change the facial appearance, rendering it still susceptible to the visual perception of the original identity by data examiners. In this paper, we propose an effective identity hider for human vision protection, which can significantly change appearance to visually hide identity while allowing identification for face recognizers. Concretely, the identity hider benefits from two specially designed modules: 1) The virtual face generation module generates a virtual face with a new appearance by manipulating the latent space of StyleGAN2. In particular, the virtual face has a similar parsing map to the original face, supporting other vision tasks such as head pose detection. 2) The appearance transfer module transfers the appearance of the virtual face into the original face via attribute replacement. Meanwhile, identity information can be preserved well with the help of the disentanglement networks. In addition, diversity and background preservation are supported to meet various requirements. Extensive experiments demonstrate that the proposed identity hider achieves excellent performance on privacy protection and identifiability preservation.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"170-181"},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Bertocco;Fernanda Andaló;Terrance E. Boult;Anderson Rocha
{"title":"Large-Scale Fully-Unsupervised Re-Identification","authors":"Gabriel Bertocco;Fernanda Andaló;Terrance E. Boult;Anderson Rocha","doi":"10.1109/TBIOM.2024.3446964","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3446964","url":null,"abstract":"Fully-unsupervised Person and Vehicle Re-Identification have received increasing attention due to their broad applicability in areas such as surveillance, forensics, event understanding, and smart cities, without requiring any manual annotation. However, most of the prior art has been evaluated in datasets that have just a couple thousand samples. Such small-data setups often allow the use of costly techniques in terms of time and memory footprints, such as Re-Ranking, to improve clustering results. Moreover, some previous work even pre-selects the best clustering hyper-parameters for each dataset, which is unrealistic in a large-scale fully-unsupervised scenario. In this context, this work tackles a more realistic scenario and proposes two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each iteration without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from <inline-formula> <tex-math>$mathcal {O}(n^{2})$ </tex-math></inline-formula> to <inline-formula> <tex-math>$mathcal {O}(kn)$ </tex-math></inline-formula> with <inline-formula> <tex-math>$k ll n$ </tex-math></inline-formula>. To avoid the need for pre-selection of specific hyper-parameter values for the clustering algorithm, we also present a novel scheduling algorithm that adjusts the density parameter during training, to leverage the diversity of samples and keep the learning robust to noisy labeling. Finally, due to the complementary knowledge learned by different models in an ensemble, we also introduce a co-training strategy that relies upon the permutation of predicted pseudo-labels, among the backbones, with no need for any hyper-parameters or weighting optimization. The proposed methodology outperforms the state-of-the-art methods in well-known benchmarks and in the challenging large-scale Veri-Wild dataset, with a faster and memory-efficient Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based learning approach.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"156-169"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Leschanowsky;Casandra Rusti;Carolyn Quinlan;Michaela Pnacek;Lauriane Gorce;Wiebke Hutiri
{"title":"A Data Perspective on Ethical Challenges in Voice Biometrics Research","authors":"Anna Leschanowsky;Casandra Rusti;Carolyn Quinlan;Michaela Pnacek;Lauriane Gorce;Wiebke Hutiri","doi":"10.1109/TBIOM.2024.3446846","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3446846","url":null,"abstract":"Speaker recognition technology, deployed in sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. However, the ethical implications and biases inherent in the datasets driving this technology have not been fully explored. Through a longitudinal study of close to 700 papers published at the ISCA Interspeech Conference in the years 2012 to 2021, we investigate how dataset use has evolved alongside the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field and examines their usage patterns. The analysis reveals significant shifts in data practices since the advent of deep learning: a small number of datasets dominate speaker recognition training and evaluation, and the majority of studies evaluate their systems on a single dataset. For four key datasets–Switchboard, Mixer, VoxCeleb, and ASVspoof–we conduct a detailed analysis of metadata and collection methods to assess ethical concerns and privacy risks. Our study highlights numerous challenges related to sampling bias, re-identification, consent, disclosure of sensitive information and security risks in speaker recognition datasets, and emphasizes the need for more representative, fair, and privacy-aware data collection in this domain.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"118-131"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Biometrics in the Social Media Era: An In-Depth Analysis of the Challenge Posed by Beautification Filters","authors":"Nelida Mirabet-Herranz;Chiara Galdi;Jean-Luc Dugelay","doi":"10.1109/TBIOM.2024.3438928","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3438928","url":null,"abstract":"Automatic beautification through social media filters has gained popularity in recent years. Users apply face filters to adhere to beauty standards, posing challenges to the reliability of facial images and complicating tasks like automatic face recognition. In this work, the impact of digital beautification is assessed, focusing on the most popular social media filters from three different platforms, on a range of AI-based face analysis technologies: face recognition, gender classification, apparent age estimation, weight estimation, and heart rate assessment. Tests are performed on our extended Facial Features Modification Filters dataset, containing a total of 24312 images and 260 videos. An extensive set of experiments is carried out to show through quantitative metrics the impact of beautification filters on the performance of the different face analysis tasks. The results reveal that employing filters significantly disrupts soft biometric estimation, resulting in a pronounced impact on the performance of weight and heart rate networks. Nevertheless, we observe that certain less aggressive filters do not adversely affect face recognition and gender estimation networks, in some instances enhancing their performances. Scripts and more information are available at \u0000<uri>https://github.com/nmirabeth/filters_biometrics</uri>\u0000.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"108-117"},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Sun;Xueling Feng;Xiaolei Liu;Liyan Ma;Long Hu;Mark S. Nixon
{"title":"TriGait: Hybrid Fusion Strategy for Multimodal Alignment and Integration in Gait Recognition","authors":"Yan Sun;Xueling Feng;Xiaolei Liu;Liyan Ma;Long Hu;Mark S. Nixon","doi":"10.1109/TBIOM.2024.3435046","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3435046","url":null,"abstract":"Due to the inherent limitations of single modalities, multimodal fusion has become increasingly popular in many computer vision fields, leveraging the complementary advantages of unimodal methods. As an emerging biometric technology with great application potential, gait recognition faces similar challenges. The prevailing silhouette-based and skeleton-based gait recognition methods have their respective limitations: one focuses on appearance information while neglecting structural details, and the other does the opposite. Multimodal gait recognition, which combines silhouette and skeleton, promises more robust predictions. However, it is essential and difficult to explore the implicit interaction between dense pixels and discrete coordinate points. Most existing multimodal gait recognition methods basically concatenated features from silhouette and skeleton and did not fully exploit complementarity between them. This paper presents a hybrid fusion strategy called TriGait, which is a three-branch structural model and thoroughly explores the interaction and complementarity of the two modalities. To solve the problem of data heterogeneity and explore the mutual information of two modalities, we propose the use of a cross-modal token generator (CMTG) within a fusion branch to align and fuse the low-level features of the two modalities. Additionally, TriGait has two extra branches for extracting high-level semantic information from silhouette and skeleton. By combining low-level correlation information and high-level semantic information, TriGait provides a comprehensive and discriminative representation of a subject’s gait. Extensive experimental results on CASIA-B, Gait3D and OUMVLP demonstrate the effectiveness of TriGait. Remarkably, TriGait achieves the rank-1 mean accuracy of 96.6%, 61.4% and 91.1% on CASIA-B, Gait3D and OUMVLP respectively, outperforming the state-of-the-art methods. The source code is available at: \u0000<uri>https://github.com/YanSun-github/TriGait/</uri>\u0000.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"82-94"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Nie;Zhiyong Wang;Xinming Wang;Bowen Chen;Hanlin Zhang;Honghai Liu
{"title":"Diving Into Sample Selection for Facial Expression Recognition With Noisy Annotations","authors":"Wei Nie;Zhiyong Wang;Xinming Wang;Bowen Chen;Hanlin Zhang;Honghai Liu","doi":"10.1109/TBIOM.2024.3435498","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3435498","url":null,"abstract":"Real-world Facial Expression Recognition (FER) suffers from noisy labels due to ambiguous expressions and subjective annotation. Overall, addressing noisy label FER involves two core issues: the efficient utilization of clean samples and the effective utilization of noisy samples. However, existing methods demonstrate their effectiveness solely through the generalization improvement by using all corrupted data, making it difficult to ascertain whether the observed improvement genuinely addresses these two issues. To decouple this dilemma, this paper focuses on efficiently utilizing clean samples by diving into sample selection. Specifically, we enhance the classical noisy label learning method Co-divide with two straightforward modifications, introducing a noisy label discriminator more suitable for FER termed IntraClass-divide. Firstly, IntraClass-divide constructs a class-separate two-component Gaussian Mixture Model (GMM) for each category instead of a shared GMM for all categories. Secondly, IntraClass-divide simplifies the framework by eliminating the dual-network training scheme. In addition to achieving the leading sample selection performance of nearly 95% Micro-F1 in standard synthetic noise paradigm, we first propose a natural noise paradigm and also achieve a leading sample selection performance of 82.63% Micro-F1. Moreover, we train a ResNet18 with the clean samples identified by IntraClass-divide yields better generalization performance than previous sophisticated noisy label FER models trained on all corrupted data.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"95-107"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message From the Editor-in-Chief","authors":"Nalini Ratha","doi":"10.1109/TBIOM.2024.3420490","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3420490","url":null,"abstract":"My three-year tenure as the Editor-in-Chief (EiC) of the IEEE Transactions on Biometrics, Behavior, and Identity Science (T-BIOM) draws to a close this June 2024. It’s been an exciting time to witness T-BIOM’s continued growth as a leading journal in biometrics research with a consistent rise in paper quality, thanks to the selection of top-reviewed papers from premier IEEE Biometrics Council conferences like IJCB, and IEEE Face and Gesture.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 3","pages":"288-288"},"PeriodicalIF":0.0,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10604473","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141725589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}