利用深度卷积神经网络的时空联合调制识别人类步态

IF 3.1 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Visual Communication and Image Representation Pub Date : 2024-10-28 DOI:10.1016/j.jvcir.2024.104322

Mohammad Iman Junaid , Allam Jaya Prakash , Samit Ari

{"title":"利用深度卷积神经网络的时空联合调制识别人类步态","authors":"Mohammad Iman Junaid , Allam Jaya Prakash , Samit Ari","doi":"10.1016/j.jvcir.2024.104322","DOIUrl":null,"url":null,"abstract":"<div><div>Gait, a person’s distinctive walking pattern, offers a promising biometric modality for surveillance applications. Unlike fingerprints or iris scans, gait can be captured from a distance without the subject’s direct cooperation or awareness. This makes it ideal for surveillance and security applications. Traditional convolutional neural networks (CNNs) often struggle with the inherent variations within video data, limiting their effectiveness in gait recognition. The proposed technique in this work introduces a unique joint spatial–temporal modulation network designed to overcome this limitation. By extracting discriminative feature representations across varying frame levels, the network effectively leverages both spatial and temporal variations within video sequences. The proposed architecture integrates attention-based CNNs for spatial feature extraction and a Bidirectional Long Short-Term Memory (Bi-LSTM) network with a temporal attention module to analyse temporal dynamics. The use of attention in spatial and temporal blocks enhances the network’s capability of focusing on the most relevant segments of the video data. This can improve efficiency since the combined approach enhances learning capabilities when processing complex gait videos. We evaluated the effectiveness of the proposed network using two major datasets, namely CASIA-B and OUMVLP. Experimental analysis on CASIA B demonstrates that the proposed network achieves an average rank-1 accuracy of 98.20% for normal walking, 94.50% for walking with a bag and 80.40% for clothing scenarios. The proposed network also achieved an accuracy of 89.10% for OU-MVLP. These results show the proposed method‘s ability to generalize to large-scale data and consistently outperform current state-of-the-art gait recognition techniques.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"105 ","pages":"Article 104322"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Human gait recognition using joint spatiotemporal modulation in deep convolutional neural networks\",\"authors\":\"Mohammad Iman Junaid , Allam Jaya Prakash , Samit Ari\",\"doi\":\"10.1016/j.jvcir.2024.104322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Gait, a person’s distinctive walking pattern, offers a promising biometric modality for surveillance applications. Unlike fingerprints or iris scans, gait can be captured from a distance without the subject’s direct cooperation or awareness. This makes it ideal for surveillance and security applications. Traditional convolutional neural networks (CNNs) often struggle with the inherent variations within video data, limiting their effectiveness in gait recognition. The proposed technique in this work introduces a unique joint spatial–temporal modulation network designed to overcome this limitation. By extracting discriminative feature representations across varying frame levels, the network effectively leverages both spatial and temporal variations within video sequences. The proposed architecture integrates attention-based CNNs for spatial feature extraction and a Bidirectional Long Short-Term Memory (Bi-LSTM) network with a temporal attention module to analyse temporal dynamics. The use of attention in spatial and temporal blocks enhances the network’s capability of focusing on the most relevant segments of the video data. This can improve efficiency since the combined approach enhances learning capabilities when processing complex gait videos. We evaluated the effectiveness of the proposed network using two major datasets, namely CASIA-B and OUMVLP. Experimental analysis on CASIA B demonstrates that the proposed network achieves an average rank-1 accuracy of 98.20% for normal walking, 94.50% for walking with a bag and 80.40% for clothing scenarios. The proposed network also achieved an accuracy of 89.10% for OU-MVLP. These results show the proposed method‘s ability to generalize to large-scale data and consistently outperform current state-of-the-art gait recognition techniques.</div></div>\",\"PeriodicalId\":54755,\"journal\":{\"name\":\"Journal of Visual Communication and Image Representation\",\"volume\":\"105 \",\"pages\":\"Article 104322\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Visual Communication and Image Representation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1047320324002785\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Visual Communication and Image Representation","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1047320324002785","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

步态是一个人独特的行走模式，它为监控应用提供了一种前景广阔的生物识别模式。与指纹或虹膜扫描不同，步态可以在远距离捕捉，而无需当事人直接配合或意识到。这使其成为监控和安全应用的理想选择。传统的卷积神经网络（CNN）往往难以应对视频数据中固有的变化，从而限制了其在步态识别中的有效性。这项工作中提出的技术引入了独特的时空联合调制网络，旨在克服这一限制。通过提取不同帧级的判别特征表征，该网络可有效利用视频序列中的空间和时间变化。所提出的架构整合了基于注意力的 CNN（用于空间特征提取）和双向长短时记忆（Bi-LSTM）网络，后者带有一个时间注意力模块，用于分析时间动态。在空间和时间块中使用注意力可增强网络关注视频数据中最相关片段的能力。这可以提高效率，因为在处理复杂步态视频时，这种组合方法增强了学习能力。我们使用 CASIA-B 和 OUMVLP 这两个主要数据集评估了拟议网络的有效性。CASIA B 数据集的实验分析表明，所提出的网络在正常行走、背包行走和穿衣场景中的平均秩-1 准确率分别为 98.20%、94.50% 和 80.40%。在 OU-MVLP 中，所提出的网络也达到了 89.10% 的准确率。这些结果表明，所提出的方法能够通用于大规模数据，并持续优于当前最先进的步态识别技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Human gait recognition using joint spatiotemporal modulation in deep convolutional neural networks

Gait, a person’s distinctive walking pattern, offers a promising biometric modality for surveillance applications. Unlike fingerprints or iris scans, gait can be captured from a distance without the subject’s direct cooperation or awareness. This makes it ideal for surveillance and security applications. Traditional convolutional neural networks (CNNs) often struggle with the inherent variations within video data, limiting their effectiveness in gait recognition. The proposed technique in this work introduces a unique joint spatial–temporal modulation network designed to overcome this limitation. By extracting discriminative feature representations across varying frame levels, the network effectively leverages both spatial and temporal variations within video sequences. The proposed architecture integrates attention-based CNNs for spatial feature extraction and a Bidirectional Long Short-Term Memory (Bi-LSTM) network with a temporal attention module to analyse temporal dynamics. The use of attention in spatial and temporal blocks enhances the network’s capability of focusing on the most relevant segments of the video data. This can improve efficiency since the combined approach enhances learning capabilities when processing complex gait videos. We evaluated the effectiveness of the proposed network using two major datasets, namely CASIA-B and OUMVLP. Experimental analysis on CASIA B demonstrates that the proposed network achieves an average rank-1 accuracy of 98.20% for normal walking, 94.50% for walking with a bag and 80.40% for clothing scenarios. The proposed network also achieved an accuracy of 89.10% for OU-MVLP. These results show the proposed method‘s ability to generalize to large-scale data and consistently outperform current state-of-the-art gait recognition techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Visual Communication and Image Representation 工程技术-计算机：软件工程

CiteScore

5.40

自引率

11.50%

发文量

188

审稿时长

9.9 months

期刊介绍： The Journal of Visual Communication and Image Representation publishes papers on state-of-the-art visual communication and image representation, with emphasis on novel technologies and theoretical work in this multidisciplinary area of pure and applied research. The field of visual communication and image representation is considered in its broadest sense and covers both digital and analog aspects as well as processing and communication in biological visual systems.