{"title":"Distillation-Guided Representation Learning for Unconstrained Video Human Authentication","authors":"Yuxiang Guo;Siyuan Huang;Ram Prabhakar Kathirvel;Chun Pong Lau;Rama Chellappa;Cheng Peng","doi":"10.1109/TBIOM.2025.3595366","DOIUrl":null,"url":null,"abstract":"Human authentication is an important and challenging biometric task, particularly from unconstrained videos. While body recognition is a popular approach, gait recognition holds the promise of robustly identifying subjects based on walking patterns instead of appearance information. Previous gait-based approaches have performed well for curated indoor scenes; however, they tend to underperform in unconstrained situations. To address these challenges, we propose a framework, termed Holistic GAit DEtection and Recognition (H-GADER), for human authentication in challenging outdoor scenarios. Specifically, H-GADER leverages a Double Helical Signature to detect segments that contain human movement and builds discriminative features through a novel gait recognition method. To further enhance robustness, H-GADER encodes viewpoint information in its architecture, and distills learned representations from an auxiliary RGB recognition model; this allows H-GADER to learn from maximum amount of data at training time. At test time, H-GADER infers solely from the silhouette modality. Furthermore, we introduce a body recognition model through semantic, large-scale, self-supervised training to complement gait recognition. By conditionally fusing gait and body representations based on the presence/absence of gait information as decided by the gait detection, we demonstrate significant improvements compared to when a single modality or a naive feature ensemble is used. We evaluate our method on multiple existing State-of-The-Arts (SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets, especially on the BRIAR dataset, which features unconstrained, long-distance videos, achieving a 28.9% improvement.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 4","pages":"940-952"},"PeriodicalIF":5.0000,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11111687","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11111687/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Human authentication is an important and challenging biometric task, particularly from unconstrained videos. While body recognition is a popular approach, gait recognition holds the promise of robustly identifying subjects based on walking patterns instead of appearance information. Previous gait-based approaches have performed well for curated indoor scenes; however, they tend to underperform in unconstrained situations. To address these challenges, we propose a framework, termed Holistic GAit DEtection and Recognition (H-GADER), for human authentication in challenging outdoor scenarios. Specifically, H-GADER leverages a Double Helical Signature to detect segments that contain human movement and builds discriminative features through a novel gait recognition method. To further enhance robustness, H-GADER encodes viewpoint information in its architecture, and distills learned representations from an auxiliary RGB recognition model; this allows H-GADER to learn from maximum amount of data at training time. At test time, H-GADER infers solely from the silhouette modality. Furthermore, we introduce a body recognition model through semantic, large-scale, self-supervised training to complement gait recognition. By conditionally fusing gait and body representations based on the presence/absence of gait information as decided by the gait detection, we demonstrate significant improvements compared to when a single modality or a naive feature ensemble is used. We evaluate our method on multiple existing State-of-The-Arts (SoTA) gait baselines and demonstrate consistent improvements on indoor and outdoor datasets, especially on the BRIAR dataset, which features unconstrained, long-distance videos, achieving a 28.9% improvement.