Haijun Xiong, Bin Feng, Bang Wang, Xinggang Wang, Wenyu Liu
{"title":"MambaGait: Gait recognition approach combining explicit representation and implicit state space model","authors":"Haijun Xiong, Bin Feng, Bang Wang, Xinggang Wang, Wenyu Liu","doi":"10.1016/j.imavis.2025.105597","DOIUrl":null,"url":null,"abstract":"<div><div>Gait recognition aims to identify pedestrians based on their unique walking patterns and has gained significant attention due to its wide range of applications. Mamba, a State Space Model, has shown great potential in modeling long sequences. However, its limited ability to capture local details hinders its effectiveness in fine-grained tasks like gait recognition. Moreover, similar to convolutional neural networks and transformers, Mamba primarily relies on implicit learning, which is constrained by the sparsity of binary silhouette sequences. Inspired by explicit feature representations in scene rendering, we introduce a novel gait descriptor, the Explicit Spatial Representation Field (ESF). It represents silhouette images as directed distance fields, enhancing their sensitivity to gait motion and facilitating richer spatiotemporal feature extraction. To further improve Mamba’s ability to capture local details, we propose the Temporal Window Switch Mamba Block (TWSM), which effectively extracts local and global spatiotemporal features via bidirectional temporal window switching. By combining explicit representation and implicit Mamba modeling, MambaGait achieves state-of-the-art performance on four challenging datasets (GREW, Gait3D, CCPG, and SUSTech1K). Code: <span><span>https://github.com/Haijun-Xiong/MambaGait</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105597"},"PeriodicalIF":4.2000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001854","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Gait recognition aims to identify pedestrians based on their unique walking patterns and has gained significant attention due to its wide range of applications. Mamba, a State Space Model, has shown great potential in modeling long sequences. However, its limited ability to capture local details hinders its effectiveness in fine-grained tasks like gait recognition. Moreover, similar to convolutional neural networks and transformers, Mamba primarily relies on implicit learning, which is constrained by the sparsity of binary silhouette sequences. Inspired by explicit feature representations in scene rendering, we introduce a novel gait descriptor, the Explicit Spatial Representation Field (ESF). It represents silhouette images as directed distance fields, enhancing their sensitivity to gait motion and facilitating richer spatiotemporal feature extraction. To further improve Mamba’s ability to capture local details, we propose the Temporal Window Switch Mamba Block (TWSM), which effectively extracts local and global spatiotemporal features via bidirectional temporal window switching. By combining explicit representation and implicit Mamba modeling, MambaGait achieves state-of-the-art performance on four challenging datasets (GREW, Gait3D, CCPG, and SUSTech1K). Code: https://github.com/Haijun-Xiong/MambaGait.
期刊介绍:
Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.