TriGait: Hybrid Fusion Strategy for Multimodal Alignment and Integration in Gait Recognition

Yan Sun;Xueling Feng;Xiaolei Liu;Liyan Ma;Long Hu;Mark S. Nixon
{"title":"TriGait: Hybrid Fusion Strategy for Multimodal Alignment and Integration in Gait Recognition","authors":"Yan Sun;Xueling Feng;Xiaolei Liu;Liyan Ma;Long Hu;Mark S. Nixon","doi":"10.1109/TBIOM.2024.3435046","DOIUrl":null,"url":null,"abstract":"Due to the inherent limitations of single modalities, multimodal fusion has become increasingly popular in many computer vision fields, leveraging the complementary advantages of unimodal methods. As an emerging biometric technology with great application potential, gait recognition faces similar challenges. The prevailing silhouette-based and skeleton-based gait recognition methods have their respective limitations: one focuses on appearance information while neglecting structural details, and the other does the opposite. Multimodal gait recognition, which combines silhouette and skeleton, promises more robust predictions. However, it is essential and difficult to explore the implicit interaction between dense pixels and discrete coordinate points. Most existing multimodal gait recognition methods basically concatenated features from silhouette and skeleton and did not fully exploit complementarity between them. This paper presents a hybrid fusion strategy called TriGait, which is a three-branch structural model and thoroughly explores the interaction and complementarity of the two modalities. To solve the problem of data heterogeneity and explore the mutual information of two modalities, we propose the use of a cross-modal token generator (CMTG) within a fusion branch to align and fuse the low-level features of the two modalities. Additionally, TriGait has two extra branches for extracting high-level semantic information from silhouette and skeleton. By combining low-level correlation information and high-level semantic information, TriGait provides a comprehensive and discriminative representation of a subject’s gait. Extensive experimental results on CASIA-B, Gait3D and OUMVLP demonstrate the effectiveness of TriGait. Remarkably, TriGait achieves the rank-1 mean accuracy of 96.6%, 61.4% and 91.1% on CASIA-B, Gait3D and OUMVLP respectively, outperforming the state-of-the-art methods. The source code is available at: \n<uri>https://github.com/YanSun-github/TriGait/</uri>\n.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"7 1","pages":"82-94"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10612818/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Due to the inherent limitations of single modalities, multimodal fusion has become increasingly popular in many computer vision fields, leveraging the complementary advantages of unimodal methods. As an emerging biometric technology with great application potential, gait recognition faces similar challenges. The prevailing silhouette-based and skeleton-based gait recognition methods have their respective limitations: one focuses on appearance information while neglecting structural details, and the other does the opposite. Multimodal gait recognition, which combines silhouette and skeleton, promises more robust predictions. However, it is essential and difficult to explore the implicit interaction between dense pixels and discrete coordinate points. Most existing multimodal gait recognition methods basically concatenated features from silhouette and skeleton and did not fully exploit complementarity between them. This paper presents a hybrid fusion strategy called TriGait, which is a three-branch structural model and thoroughly explores the interaction and complementarity of the two modalities. To solve the problem of data heterogeneity and explore the mutual information of two modalities, we propose the use of a cross-modal token generator (CMTG) within a fusion branch to align and fuse the low-level features of the two modalities. Additionally, TriGait has two extra branches for extracting high-level semantic information from silhouette and skeleton. By combining low-level correlation information and high-level semantic information, TriGait provides a comprehensive and discriminative representation of a subject’s gait. Extensive experimental results on CASIA-B, Gait3D and OUMVLP demonstrate the effectiveness of TriGait. Remarkably, TriGait achieves the rank-1 mean accuracy of 96.6%, 61.4% and 91.1% on CASIA-B, Gait3D and OUMVLP respectively, outperforming the state-of-the-art methods. The source code is available at: https://github.com/YanSun-github/TriGait/ .
步态识别中多模态对齐与集成的混合融合策略
由于单模态固有的局限性,利用单模态方法的互补优势,多模态融合在许多计算机视觉领域越来越受欢迎。步态识别作为一项具有巨大应用潜力的新兴生物识别技术,也面临着类似的挑战。目前流行的基于轮廓的步态识别方法和基于骨骼的步态识别方法都有各自的局限性:一种侧重于外观信息而忽略了结构细节,另一种则相反。多模式步态识别,结合了轮廓和骨架,承诺更强大的预测。然而,探索密集像素和离散坐标点之间的隐式相互作用是必要的和困难的。现有的多模态步态识别方法基本上是将轮廓和骨骼特征拼接在一起,没有充分利用两者之间的互补性。本文提出了一种称为TriGait的混合融合策略,它是一种三分支结构模型,并深入探讨了两种模式的相互作用和互补性。为了解决数据异构问题并探索两模态的相互信息,我们提出在融合分支中使用跨模态令牌生成器(CMTG)来对齐和融合两模态的底层特征。此外,TriGait还有两个额外的分支,用于从轮廓和骨架中提取高级语义信息。TriGait通过结合低水平的相关信息和高水平的语义信息,为受试者的步态提供了一个全面的、有区别的表征。在CASIA-B、Gait3D和OUMVLP上的大量实验结果证明了TriGait的有效性。值得注意的是,TriGait在CASIA-B、Gait3D和OUMVLP上的rank-1平均准确率分别达到96.6%、61.4%和91.1%,优于目前最先进的方法。源代码可从https://github.com/YanSun-github/TriGait/获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.90
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信