MambaGait: Gait recognition approach combining explicit representation and implicit state space model

IF 4.2 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Image and Vision Computing Pub Date : 2025-06-09 DOI:10.1016/j.imavis.2025.105597

Haijun Xiong, Bin Feng, Bang Wang, Xinggang Wang, Wenyu Liu

{"title":"MambaGait: Gait recognition approach combining explicit representation and implicit state space model","authors":"Haijun Xiong, Bin Feng, Bang Wang, Xinggang Wang, Wenyu Liu","doi":"10.1016/j.imavis.2025.105597","DOIUrl":null,"url":null,"abstract":"<div><div>Gait recognition aims to identify pedestrians based on their unique walking patterns and has gained significant attention due to its wide range of applications. Mamba, a State Space Model, has shown great potential in modeling long sequences. However, its limited ability to capture local details hinders its effectiveness in fine-grained tasks like gait recognition. Moreover, similar to convolutional neural networks and transformers, Mamba primarily relies on implicit learning, which is constrained by the sparsity of binary silhouette sequences. Inspired by explicit feature representations in scene rendering, we introduce a novel gait descriptor, the Explicit Spatial Representation Field (ESF). It represents silhouette images as directed distance fields, enhancing their sensitivity to gait motion and facilitating richer spatiotemporal feature extraction. To further improve Mamba’s ability to capture local details, we propose the Temporal Window Switch Mamba Block (TWSM), which effectively extracts local and global spatiotemporal features via bidirectional temporal window switching. By combining explicit representation and implicit Mamba modeling, MambaGait achieves state-of-the-art performance on four challenging datasets (GREW, Gait3D, CCPG, and SUSTech1K). Code: <span><span>https://github.com/Haijun-Xiong/MambaGait</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"161 ","pages":"Article 105597"},"PeriodicalIF":4.2000,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Image and Vision Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0262885625001854","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Gait recognition aims to identify pedestrians based on their unique walking patterns and has gained significant attention due to its wide range of applications. Mamba, a State Space Model, has shown great potential in modeling long sequences. However, its limited ability to capture local details hinders its effectiveness in fine-grained tasks like gait recognition. Moreover, similar to convolutional neural networks and transformers, Mamba primarily relies on implicit learning, which is constrained by the sparsity of binary silhouette sequences. Inspired by explicit feature representations in scene rendering, we introduce a novel gait descriptor, the Explicit Spatial Representation Field (ESF). It represents silhouette images as directed distance fields, enhancing their sensitivity to gait motion and facilitating richer spatiotemporal feature extraction. To further improve Mamba’s ability to capture local details, we propose the Temporal Window Switch Mamba Block (TWSM), which effectively extracts local and global spatiotemporal features via bidirectional temporal window switching. By combining explicit representation and implicit Mamba modeling, MambaGait achieves state-of-the-art performance on four challenging datasets (GREW, Gait3D, CCPG, and SUSTech1K). Code: https://github.com/Haijun-Xiong/MambaGait.

查看原文本刊更多论文

MambaGait：结合显式表示和隐式状态空间模型的步态识别方法

步态识别的目的是根据行人独特的行走方式来识别行人，由于其广泛的应用而受到了广泛的关注。曼巴是一种状态空间模型，在长序列建模中显示出巨大的潜力。然而，它捕捉局部细节的能力有限，阻碍了它在精细任务（如步态识别）中的有效性。此外，与卷积神经网络和变压器类似，Mamba主要依赖于内隐学习，这受到二元轮廓序列稀疏性的限制。受场景绘制中显式特征表示的启发，我们引入了一种新的步态描述符——显式空间表示场（ESF）。它将轮廓图像表示为定向距离场，增强了轮廓图像对步态运动的敏感性，便于提取更丰富的时空特征。为了进一步提高曼巴捕获局部细节的能力，我们提出了时间窗口切换曼巴块（TWSM），它通过双向时间窗口切换有效地提取局部和全局时空特征。通过结合显式表示和隐式曼巴建模，MambaGait在四个具有挑战性的数据集（grow, Gait3D， CCPG和SUSTech1K）上实现了最先进的性能。代码:https://github.com/Haijun-Xiong/MambaGait。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Image and Vision Computing 工程技术-工程：电子与电气

CiteScore

8.50

自引率

8.50%

发文量

143

审稿时长

7.8 months

期刊介绍： Image and Vision Computing has as a primary aim the provision of an effective medium of interchange for the results of high quality theoretical and applied research fundamental to all aspects of image interpretation and computer vision. The journal publishes work that proposes new image interpretation and computer vision methodology or addresses the application of such methods to real world scenes. It seeks to strengthen a deeper understanding in the discipline by encouraging the quantitative comparison and performance evaluation of the proposed methodology. The coverage includes: image interpretation, scene modelling, object recognition and tracking, shape analysis, monitoring and surveillance, active vision and robotic systems, SLAM, biologically-inspired computer vision, motion analysis, stereo vision, document image understanding, character and handwritten text recognition, face and gesture recognition, biometrics, vision-based human-computer interaction, human activity and behavior understanding, data fusion from multiple sensor inputs, image databases.