{"title":"Step Count Print: A Physical Activity-Based Biometric Identifier for User Identification and Authentication","authors":"Zhen Chen;Keqin Shi;Weiqiang Sun","doi":"10.1109/TBIOM.2024.3466269","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3466269","url":null,"abstract":"Step count is one of the most widely used physical activity data and is easily accessible through smart phones and wearable devices. It records the intensity and happening time of a user’s physical activities, and often reflects a users’ unique way of living. Incorporation of step count into biometric systems may thus offer an opportunity to develop innovative, user-friendly and non-invasive strategies of user identification and authentication. In this paper, we propose Step Count Print (SCP), a physical activity-based novel biometric identifier. Extracted from coarse-grained minute-level physical activity data (step counts), SCP contains features, including user step cadence distribution and average step distribution etc., that reflect an individual’s physical activity behavior. With data collected from 100 users in a five-year long period, we conducted an ablation study to demonstrate the non-redundancy of SCP in user identification and authentication scenarios using commonly used machine learning algorithms. The results show that SCP can achieve a Rank-1 rate of up to 75.0% in user identification scenarios and an average accuracy of 92.3% in user authentication scenarios. In different classification algorithms, the user’s accuracy histogram is drawn to demonstrate the universality of SCP and its effectiveness across a range of scenarios and use cases.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"210-224"},"PeriodicalIF":0.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Mahedi Hasan;Shoaib Meraj Sami;Nasser M. Nasrabadi;Jeremy Dawson
{"title":"Learning Multi-Scale Knowledge-Guided Features for Text-Guided Face Recognition","authors":"Md Mahedi Hasan;Shoaib Meraj Sami;Nasser M. Nasrabadi;Jeremy Dawson","doi":"10.1109/TBIOM.2024.3466216","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3466216","url":null,"abstract":"Text-guided face recognition (TGFR) aims to improve the performance of state-of-the-art face recognition (FR) algorithms by incorporating auxiliary information, such as distinct facial marks and attributes, provided as natural language descriptions. Current TGFR algorithms have been proven to be highly effective in addressing performance drops in state-of-the-art FR models, particularly in scenarios involving sensor noise, low resolution, and turbulence effects. Although existing methods explore various algorithms using different cross-modal alignment and fusion techniques, they encounter practical limitations in real-world applications. For example, during inference, textual descriptions associated with face images may be missing, lacking crucial details, or incorrect. Furthermore, the presence of inherent modality heterogeneity poses a significant challenge in achieving effective cross-modal alignment. To address these challenges, we introduce CaptionFace, a TGFR framework that integrates GPTFace, a face image captioning model designed to generate context-rich natural language descriptions from low-resolution facial images. By leveraging GPTFace, we overcome the issue of missing textual descriptions, expanding the applicability of CaptionFace to single-modal FR datasets. Additionally, we introduce a multi-scale feature alignment (MSFA) module to ensure semantic alignment between face-caption pairs at different granularities. Furthermore, we introduce an attribute-aware loss and perform knowledge adaptation to specifically adapt textual knowledge from facial features. Extensive experiments on three face-caption datasets and various unconstrained single-modal benchmark datasets demonstrate that CaptionFace significantly outperforms state-of-the-art FR models and existing TGFR approaches.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"195-209"},"PeriodicalIF":0.0,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qingran Miao;Haixia Wang;Yilong Zhang;Rui Yan;Yipeng Liu
{"title":"Sweat Gland Enhancement Method for Fingertip OCT Images Based on Generative Adversarial Network","authors":"Qingran Miao;Haixia Wang;Yilong Zhang;Rui Yan;Yipeng Liu","doi":"10.1109/TBIOM.2024.3459812","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3459812","url":null,"abstract":"Sweat pores are gaining recognition as a secure, reliable, and identifiable third-level fingerprint feature. Challenges arise in collecting sweat pores when fingers are contaminated, dry, or damaged, leading to unclear or vanished surface sweat pores. Optical Coherence Tomography (OCT) has been applied in the collection of fingertip biometric features. The sweat pores mapped from the subcutaneous sweat glands collected by OCT possess higher security and stability. However, speckle noise in OCT images can blur sweat glands making segmentation and extraction difficult. Traditional denoising methods cause unclear sweat gland contours and structural loss due to smearing and excessive smoothing. Deep learning-based methods have not achieved good results due to the lack of clean images as ground-truth. This paper proposes a sweat gland enhancement method for fingertip OCT images based on Generative Adversarial Network (GAN). It can effectively remove speckle noise while eliminating irrelevant structures and repairing the lost structure of sweat glands, ultimately improving the accuracy of sweat gland segmentation and extraction. To the best knowledge, it is the first time that sweat gland enhancement is investigated and proposed. In this method, a paired dataset generation strategy is proposed, which can extend few manually enhanced ground-truth into a high-quality paired dataset. An improved Pix2Pix for sweat gland enhancement is proposed, with the addition of a perceptual loss to mitigate structural distortions during the image translation process. It’s worth noting that after obtaining the paired dataset, any advanced supervised image-to-image translation network can be adapted into our framework for enhancement. Experiments are carried out to verify the effectiveness of the proposed method.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 4","pages":"550-560"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142713974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Are Synthetic Datasets Reliable for Benchmarking Generalizable Person Re-Identification?","authors":"Cuicui Kang","doi":"10.1109/TBIOM.2024.3459828","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3459828","url":null,"abstract":"Recent studies show that models trained on synthetic datasets are able to achieve better generalizable person re-identification (GPReID) performance than that trained on public real-world datasets. On the other hand, due to the limitations of real-world person ReID datasets, it would also be important and interesting to use large-scale synthetic datasets as test sets to benchmark person ReID algorithms. Yet this raises a critical question: are synthetic datasets reliable for benchmarking generalizable person re-identification? In the literature there is no evidence showing this. To address this, we design a method called Pairwise Ranking Analysis (PRA) to quantitatively measure the ranking similarity, and a subsequent method called Metric-Independent Statistical Test (MIST) to perform the statistical test of identical distributions. Specifically, we employ Kendall rank correlation coefficients to evaluate pairwise similarity values between algorithm rankings on different datasets. Then, after removing metric dependency via PRA, a non-parametric two-sample Kolmogorov-Smirnov (KS) test is performed for the judgement of whether algorithm ranking correlations between synthetic and real-world datasets and those only between real-world datasets lie in identical distributions. We conduct comprehensive experiments, with twelve representative algorithms, three popular real-world person ReID datasets, and three recently released large-scale synthetic datasets. Through the designed PRA and MIST methods and comprehensive evaluations, we conclude that the recent large-scale synthetic datasets ClonedPerson, UnrealPerson and RandPerson can be reliably used to benchmark GPReID, statistically the same as real-world datasets. Therefore, this study guarantees the usage of synthetic datasets for both source training set and target testing set, with completely no privacy concerns from real-world surveillance data. Besides, the study in this paper might also inspire future designs of synthetic datasets, as the resulting p-values via the proposed MIST method can also be used to assess the reliability of a synthetic dataset for benchmarking algorithms.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"146-155"},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Expression Recognition Based on Feature Representation Learning and Clustering-Based Attention Mechanism","authors":"Lianghai Jin;Liyuan Guo","doi":"10.1109/TBIOM.2024.3454975","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3454975","url":null,"abstract":"Facial expression recognition (FER) plays an important role in many computer vision applications. Generally, FER networks are trained based on the annotated labels or the probability distribution that an expression belongs to seven expression categories. However, the quality of annotations is heavily affected by ambiguous and indistinguishable facial expressions caused by compound and mixed emotions. Furthermore, it is difficult to annotate the seven-dimensional labels (probability distributions). To address these problems, this paper proposes a new FER network model. This model represents each type of facial expression as a high dimensional feature vector, based on which the FER network is trained. The high-dimensional feature representation of each facial expression class is learned by a special binary feature representation generator network. We also develop a clustering-based group split attention mechanism, which enhances the emotion-related features effectively. The experimental results on two lab-controlled datasets and four in-the-wild datasets demonstrate the effectiveness of the proposed FER model by showing clear performance improvements over other state-of-the-art FER methods. Codes are available at <uri>https://github.com/Gabrella/GLA-FNet</uri>.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"182-194"},"PeriodicalIF":0.0,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Wang;Yushu Zhang;Zixuan Yang;Xiangli Xiao;Hua Zhang;Zhongyun Hua
{"title":"Seeing is Not Believing: An Identity Hider for Human Vision Privacy Protection","authors":"Tao Wang;Yushu Zhang;Zixuan Yang;Xiangli Xiao;Hua Zhang;Zhongyun Hua","doi":"10.1109/TBIOM.2024.3449849","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3449849","url":null,"abstract":"Massive captured face images are stored in the database for the identification of individuals. However, these images can be observed unintentionally by data examiners, which is not at the will of individuals and may cause privacy violations. Existing protection schemes can maintain identifiability but slightly change the facial appearance, rendering it still susceptible to the visual perception of the original identity by data examiners. In this paper, we propose an effective identity hider for human vision protection, which can significantly change appearance to visually hide identity while allowing identification for face recognizers. Concretely, the identity hider benefits from two specially designed modules: 1) The virtual face generation module generates a virtual face with a new appearance by manipulating the latent space of StyleGAN2. In particular, the virtual face has a similar parsing map to the original face, supporting other vision tasks such as head pose detection. 2) The appearance transfer module transfers the appearance of the virtual face into the original face via attribute replacement. Meanwhile, identity information can be preserved well with the help of the disentanglement networks. In addition, diversity and background preservation are supported to meet various requirements. Extensive experiments demonstrate that the proposed identity hider achieves excellent performance on privacy protection and identifiability preservation.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"170-181"},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Bertocco;Fernanda Andaló;Terrance E. Boult;Anderson Rocha
{"title":"Large-Scale Fully-Unsupervised Re-Identification","authors":"Gabriel Bertocco;Fernanda Andaló;Terrance E. Boult;Anderson Rocha","doi":"10.1109/TBIOM.2024.3446964","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3446964","url":null,"abstract":"Fully-unsupervised Person and Vehicle Re-Identification have received increasing attention due to their broad applicability in areas such as surveillance, forensics, event understanding, and smart cities, without requiring any manual annotation. However, most of the prior art has been evaluated in datasets that have just a couple thousand samples. Such small-data setups often allow the use of costly techniques in terms of time and memory footprints, such as Re-Ranking, to improve clustering results. Moreover, some previous work even pre-selects the best clustering hyper-parameters for each dataset, which is unrealistic in a large-scale fully-unsupervised scenario. In this context, this work tackles a more realistic scenario and proposes two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each iteration without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from <inline-formula> <tex-math>$mathcal {O}(n^{2})$ </tex-math></inline-formula> to <inline-formula> <tex-math>$mathcal {O}(kn)$ </tex-math></inline-formula> with <inline-formula> <tex-math>$k ll n$ </tex-math></inline-formula>. To avoid the need for pre-selection of specific hyper-parameter values for the clustering algorithm, we also present a novel scheduling algorithm that adjusts the density parameter during training, to leverage the diversity of samples and keep the learning robust to noisy labeling. Finally, due to the complementary knowledge learned by different models in an ensemble, we also introduce a co-training strategy that relies upon the permutation of predicted pseudo-labels, among the backbones, with no need for any hyper-parameters or weighting optimization. The proposed methodology outperforms the state-of-the-art methods in well-known benchmarks and in the challenging large-scale Veri-Wild dataset, with a faster and memory-efficient Re-Ranking strategy, and a large-scale, noisy-robust, and ensemble-based learning approach.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 2","pages":"156-169"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143698181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna Leschanowsky;Casandra Rusti;Carolyn Quinlan;Michaela Pnacek;Lauriane Gorce;Wiebke Hutiri
{"title":"A Data Perspective on Ethical Challenges in Voice Biometrics Research","authors":"Anna Leschanowsky;Casandra Rusti;Carolyn Quinlan;Michaela Pnacek;Lauriane Gorce;Wiebke Hutiri","doi":"10.1109/TBIOM.2024.3446846","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3446846","url":null,"abstract":"Speaker recognition technology, deployed in sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. However, the ethical implications and biases inherent in the datasets driving this technology have not been fully explored. Through a longitudinal study of close to 700 papers published at the ISCA Interspeech Conference in the years 2012 to 2021, we investigate how dataset use has evolved alongside the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field and examines their usage patterns. The analysis reveals significant shifts in data practices since the advent of deep learning: a small number of datasets dominate speaker recognition training and evaluation, and the majority of studies evaluate their systems on a single dataset. For four key datasets–Switchboard, Mixer, VoxCeleb, and ASVspoof–we conduct a detailed analysis of metadata and collection methods to assess ethical concerns and privacy risks. Our study highlights numerous challenges related to sampling bias, re-identification, consent, disclosure of sensitive information and security risks in speaker recognition datasets, and emphasizes the need for more representative, fair, and privacy-aware data collection in this domain.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"118-131"},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Facial Biometrics in the Social Media Era: An In-Depth Analysis of the Challenge Posed by Beautification Filters","authors":"Nelida Mirabet-Herranz;Chiara Galdi;Jean-Luc Dugelay","doi":"10.1109/TBIOM.2024.3438928","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3438928","url":null,"abstract":"Automatic beautification through social media filters has gained popularity in recent years. Users apply face filters to adhere to beauty standards, posing challenges to the reliability of facial images and complicating tasks like automatic face recognition. In this work, the impact of digital beautification is assessed, focusing on the most popular social media filters from three different platforms, on a range of AI-based face analysis technologies: face recognition, gender classification, apparent age estimation, weight estimation, and heart rate assessment. Tests are performed on our extended Facial Features Modification Filters dataset, containing a total of 24312 images and 260 videos. An extensive set of experiments is carried out to show through quantitative metrics the impact of beautification filters on the performance of the different face analysis tasks. The results reveal that employing filters significantly disrupts soft biometric estimation, resulting in a pronounced impact on the performance of weight and heart rate networks. Nevertheless, we observe that certain less aggressive filters do not adversely affect face recognition and gender estimation networks, in some instances enhancing their performances. Scripts and more information are available at \u0000<uri>https://github.com/nmirabeth/filters_biometrics</uri>\u0000.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"108-117"},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yan Sun;Xueling Feng;Xiaolei Liu;Liyan Ma;Long Hu;Mark S. Nixon
{"title":"TriGait: Hybrid Fusion Strategy for Multimodal Alignment and Integration in Gait Recognition","authors":"Yan Sun;Xueling Feng;Xiaolei Liu;Liyan Ma;Long Hu;Mark S. Nixon","doi":"10.1109/TBIOM.2024.3435046","DOIUrl":"https://doi.org/10.1109/TBIOM.2024.3435046","url":null,"abstract":"Due to the inherent limitations of single modalities, multimodal fusion has become increasingly popular in many computer vision fields, leveraging the complementary advantages of unimodal methods. As an emerging biometric technology with great application potential, gait recognition faces similar challenges. The prevailing silhouette-based and skeleton-based gait recognition methods have their respective limitations: one focuses on appearance information while neglecting structural details, and the other does the opposite. Multimodal gait recognition, which combines silhouette and skeleton, promises more robust predictions. However, it is essential and difficult to explore the implicit interaction between dense pixels and discrete coordinate points. Most existing multimodal gait recognition methods basically concatenated features from silhouette and skeleton and did not fully exploit complementarity between them. This paper presents a hybrid fusion strategy called TriGait, which is a three-branch structural model and thoroughly explores the interaction and complementarity of the two modalities. To solve the problem of data heterogeneity and explore the mutual information of two modalities, we propose the use of a cross-modal token generator (CMTG) within a fusion branch to align and fuse the low-level features of the two modalities. Additionally, TriGait has two extra branches for extracting high-level semantic information from silhouette and skeleton. By combining low-level correlation information and high-level semantic information, TriGait provides a comprehensive and discriminative representation of a subject’s gait. Extensive experimental results on CASIA-B, Gait3D and OUMVLP demonstrate the effectiveness of TriGait. Remarkably, TriGait achieves the rank-1 mean accuracy of 96.6%, 61.4% and 91.1% on CASIA-B, Gait3D and OUMVLP respectively, outperforming the state-of-the-art methods. The source code is available at: \u0000<uri>https://github.com/YanSun-github/TriGait/</uri>\u0000.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"82-94"},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}