GaitTriViT and GaitVViT: Transformer-based methods emphasizing spatial or temporal aspects in gait recognition.

IF 2.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

PeerJ Computer Science Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI:10.7717/peerj-cs.3061

Hongyun Sheng

{"title":"GaitTriViT and GaitVViT: Transformer-based methods emphasizing spatial or temporal aspects in gait recognition.","authors":"Hongyun Sheng","doi":"10.7717/peerj-cs.3061","DOIUrl":null,"url":null,"abstract":"<p><p>In image recognition tasks, subjects with long distances and low resolution remain a challenge, whereas gait recognition, identifying subjects by walking patterns, is considered one of the most promising biometric technologies due to its stability and efficiency. Previous gait recognition methods mostly focused on constructing a sophisticated model structure for better model performance during evaluation. Moreover, these methods are primarily based on traditional convolutional neural networks (CNNs) due to the dominance of CNNs in computer vision. However, since the alternative form of Transformer, named Vision Transformers (ViTs), has been introduced into the computer vision field, the ViTs have gained strong attention for its outstanding performance in various tasks. Thus, unlike previous methods, this project introduces two Transformer-based methods: a completely ViTs-based method GaitTriViT, and a Video Vision Transformer (Video ViT) based method GaitVViT. The GaitTriViT leverages the ViTs to gain more fine-grained spatial features, while GaitVViT enhances the capacity of temporal extraction. This work evaluates their performances and the results show the still-existing gaps and several encouraging outperforms compared with current state-of-the-art (SOTA), demonstrating the difficulties and challenges these Transformer-based methods will encounter continuously. However, the future of Vision Transformers in gait recognition is still promising.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"11 ","pages":"e3061"},"PeriodicalIF":2.5000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12453820/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.3061","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In image recognition tasks, subjects with long distances and low resolution remain a challenge, whereas gait recognition, identifying subjects by walking patterns, is considered one of the most promising biometric technologies due to its stability and efficiency. Previous gait recognition methods mostly focused on constructing a sophisticated model structure for better model performance during evaluation. Moreover, these methods are primarily based on traditional convolutional neural networks (CNNs) due to the dominance of CNNs in computer vision. However, since the alternative form of Transformer, named Vision Transformers (ViTs), has been introduced into the computer vision field, the ViTs have gained strong attention for its outstanding performance in various tasks. Thus, unlike previous methods, this project introduces two Transformer-based methods: a completely ViTs-based method GaitTriViT, and a Video Vision Transformer (Video ViT) based method GaitVViT. The GaitTriViT leverages the ViTs to gain more fine-grained spatial features, while GaitVViT enhances the capacity of temporal extraction. This work evaluates their performances and the results show the still-existing gaps and several encouraging outperforms compared with current state-of-the-art (SOTA), demonstrating the difficulties and challenges these Transformer-based methods will encounter continuously. However, the future of Vision Transformers in gait recognition is still promising.

查看原文本刊更多论文

GaitTriViT和GaitVViT：基于变压器的方法，强调步态识别的空间或时间方面。

在图像识别任务中，远距离和低分辨率的对象仍然是一个挑战，而步态识别，通过步行模式识别对象，由于其稳定性和效率被认为是最有前途的生物识别技术之一。以往的步态识别方法大多侧重于构建复杂的模型结构，以便在评估过程中获得更好的模型性能。此外，由于卷积神经网络在计算机视觉中的主导地位，这些方法主要基于传统的卷积神经网络（cnn）。然而，自从变压器的另一种形式——视觉变压器（Vision Transformer, ViTs）被引入计算机视觉领域以来，ViTs因其在各种任务中的出色表现而受到了广泛关注。因此，与以往的方法不同，本项目引入了两种基于变压器的方法：一种完全基于ViT的方法GaitTriViT，以及一种基于视频视觉变压器（Video ViT）的方法GaitVViT。GaitTriViT利用vit来获得更细粒度的空间特征，而gaitvit增强了时间提取的能力。这项工作评估了它们的性能，结果显示了与当前最先进的（SOTA）相比仍然存在的差距和一些令人鼓舞的优势，表明了这些基于变压器的方法将不断遇到的困难和挑战。然而，视觉变形在步态识别方面的应用前景仍然广阔。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.