Radar gait recognition using Dual-branch Swin Transformer with Asymmetric Attention Fusion

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2024-10-31 DOI:10.1016/j.patcog.2024.111101

Wentao He , Jianfeng Ren , Ruibin Bai , Xudong Jiang

{"title":"Radar gait recognition using Dual-branch Swin Transformer with Asymmetric Attention Fusion","authors":"Wentao He , Jianfeng Ren , Ruibin Bai , Xudong Jiang","doi":"10.1016/j.patcog.2024.111101","DOIUrl":null,"url":null,"abstract":"<div><div>Video-based gait recognition suffers from potential privacy issues and performance degradation due to dim environments, partial occlusions, or camera view changes. Radar has recently become increasingly popular and overcome various challenges presented by vision sensors. To capture tiny differences in radar gait signatures of different people, a dual-branch Swin Transformer is proposed, where one branch captures the time variations of the radar micro-Doppler signature and the other captures the repetitive frequency patterns in the spectrogram. Unlike natural images where objects can be translated, rotated, or scaled, the spatial coordinates of spectrograms and CVDs have unique physical meanings, and there is no affine transformation for radar targets in these synthetic images. The patch splitting mechanism in Vision Transformer makes it ideal to extract discriminant information from patches, and learn the attentive information across patches, as each patch carries some unique physical properties of radar targets. Swin Transformer consists of a set of cascaded Swin blocks to extract semantic features from shallow to deep representations, further improving the classification performance. Lastly, to highlight the branch with larger discriminant power, an Asymmetric Attention Fusion is proposed to optimally fuse the discriminant features from the two branches. To enrich the research on radar gait recognition, a large-scale NTU-RGR dataset is constructed, containing 45,768 radar frames of 98 subjects. The proposed method is evaluated on the NTU-RGR dataset and the MMRGait-1.0 database. It consistently and significantly outperforms all the compared methods on both datasets. <em>The codes are available at:</em> <span><span>https://github.com/wentaoheunnc/NTU-RGR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111101"},"PeriodicalIF":7.5000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324008525","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video-based gait recognition suffers from potential privacy issues and performance degradation due to dim environments, partial occlusions, or camera view changes. Radar has recently become increasingly popular and overcome various challenges presented by vision sensors. To capture tiny differences in radar gait signatures of different people, a dual-branch Swin Transformer is proposed, where one branch captures the time variations of the radar micro-Doppler signature and the other captures the repetitive frequency patterns in the spectrogram. Unlike natural images where objects can be translated, rotated, or scaled, the spatial coordinates of spectrograms and CVDs have unique physical meanings, and there is no affine transformation for radar targets in these synthetic images. The patch splitting mechanism in Vision Transformer makes it ideal to extract discriminant information from patches, and learn the attentive information across patches, as each patch carries some unique physical properties of radar targets. Swin Transformer consists of a set of cascaded Swin blocks to extract semantic features from shallow to deep representations, further improving the classification performance. Lastly, to highlight the branch with larger discriminant power, an Asymmetric Attention Fusion is proposed to optimally fuse the discriminant features from the two branches. To enrich the research on radar gait recognition, a large-scale NTU-RGR dataset is constructed, containing 45,768 radar frames of 98 subjects. The proposed method is evaluated on the NTU-RGR dataset and the MMRGait-1.0 database. It consistently and significantly outperforms all the compared methods on both datasets. The codes are available at: https://github.com/wentaoheunnc/NTU-RGR.

查看原文本刊更多论文

使用双分支斯温变换器与非对称注意力融合进行雷达步态识别

基于视频的步态识别存在潜在的隐私问题，并且由于环境昏暗、部分遮挡或摄像头视角变化而导致性能下降。最近，雷达变得越来越流行，并克服了视觉传感器带来的各种挑战。为了捕捉不同人的雷达步态特征的微小差异，提出了一种双分支斯温变换器，其中一个分支捕捉雷达微多普勒特征的时间变化，另一个分支捕捉频谱图中的重复频率模式。与自然图像不同，自然图像中的物体可以平移、旋转或缩放，而频谱图和 CVD 的空间坐标具有独特的物理意义，因此这些合成图像中的雷达目标不存在仿射变换。Vision Transformer 中的补丁分割机制使其非常适合从补丁中提取判别信息，并学习补丁间的注意信息，因为每个补丁都带有雷达目标的一些独特物理特性。Swin Transformer 由一组级联 Swin 块组成，可从浅层到深层表征中提取语义特征，从而进一步提高分类性能。最后，为了突出具有更大判别能力的分支，提出了一种非对称注意力融合方法，以优化融合两个分支的判别特征。为了丰富雷达步态识别的研究，我们构建了一个大规模的 NTU-RGR 数据集，其中包含 98 个受试者的 45,768 个雷达帧。所提出的方法在 NTU-RGR 数据集和 MMRGait-1.0 数据库上进行了评估。在这两个数据集上，该方法始终明显优于所有比较过的方法。代码见：https://github.com/wentaoheunnc/NTU-RGR。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.