一种探索多模态手势识别时空相关性的全局局部融合模型

IF 3.8 Q2 ENGINEERING, BIOMEDICAL

IEEE transactions on medical robotics and bionics Pub Date : 2025-03-12 DOI:10.1109/TMRB.2025.3550646

Shengcai Duan;Le Wu;Aiping Liu;Xun Chen

{"title":"一种探索多模态手势识别时空相关性的全局局部融合模型","authors":"Shengcai Duan;Le Wu;Aiping Liu;Xun Chen","doi":"10.1109/TMRB.2025.3550646","DOIUrl":null,"url":null,"abstract":"Hand Gesture Recognition (HGR) employing surface electromyography (sEMG) and accelerometer (ACC) signals has garnered increasing interest in areas of bionic prostheses and human-machine interaction. However, existing multimodal approaches predominantly extract global specificity at a single temporal scale, which neglects local dynamic characteristics. This limitation hinders the effective capture of global-local temporal information, resulting in restricted performance and frequent misclassification of dynamic gestures. To this end, we propose a novel global-local Fusion model, termed Temporal-spatial Dependence Fusion (TsdFusion), for sEMG-ACC-based HGR. TsdFusion harnesses temporal-spatial dependencies (Tsd) from multi-time scale handcrafted features and employs a Convolution-Transformer framework for global-local fusion, thus enriching local dynamic information while preserving global insights. Specifically, the Tsd inputs are independently constructed from sEMG and ACC through multi-time scale window segmentation and feature engineering. Furthermore, the global and local temporal-spatial correlations within unimodal Tsd inputs are characterized by the unimodal transformer and dimension-wise convolution modules, respectively. Subsequently, a Convolution-coupled-transformer progressive hierarchical fusion module effectively integrates intramodal specificity and intermodal hierarchical relationship for final prediction. Evaluations on four public datasets, including transradial amputees and healthy subjects, demonstrate TsdFusion outperforms the state-of-the-art multimodal HGR methods. The TsdFusion effectively recognizes dynamic gestures, facilitating promising HGR-based interaction for prostheses or assistance robotics.","PeriodicalId":73318,"journal":{"name":"IEEE transactions on medical robotics and bionics","volume":"7 2","pages":"723-733"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Global–Local Fusion Model Exploring Temporal–Spatial Dependence for Multimodal Hand Gesture Recognition\",\"authors\":\"Shengcai Duan;Le Wu;Aiping Liu;Xun Chen\",\"doi\":\"10.1109/TMRB.2025.3550646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand Gesture Recognition (HGR) employing surface electromyography (sEMG) and accelerometer (ACC) signals has garnered increasing interest in areas of bionic prostheses and human-machine interaction. However, existing multimodal approaches predominantly extract global specificity at a single temporal scale, which neglects local dynamic characteristics. This limitation hinders the effective capture of global-local temporal information, resulting in restricted performance and frequent misclassification of dynamic gestures. To this end, we propose a novel global-local Fusion model, termed Temporal-spatial Dependence Fusion (TsdFusion), for sEMG-ACC-based HGR. TsdFusion harnesses temporal-spatial dependencies (Tsd) from multi-time scale handcrafted features and employs a Convolution-Transformer framework for global-local fusion, thus enriching local dynamic information while preserving global insights. Specifically, the Tsd inputs are independently constructed from sEMG and ACC through multi-time scale window segmentation and feature engineering. Furthermore, the global and local temporal-spatial correlations within unimodal Tsd inputs are characterized by the unimodal transformer and dimension-wise convolution modules, respectively. Subsequently, a Convolution-coupled-transformer progressive hierarchical fusion module effectively integrates intramodal specificity and intermodal hierarchical relationship for final prediction. Evaluations on four public datasets, including transradial amputees and healthy subjects, demonstrate TsdFusion outperforms the state-of-the-art multimodal HGR methods. The TsdFusion effectively recognizes dynamic gestures, facilitating promising HGR-based interaction for prostheses or assistance robotics.\",\"PeriodicalId\":73318,\"journal\":{\"name\":\"IEEE transactions on medical robotics and bionics\",\"volume\":\"7 2\",\"pages\":\"723-733\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on medical robotics and bionics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10924260/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical robotics and bionics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10924260/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 0

摘要

采用表面肌电图（sEMG）和加速度计（ACC）信号的手势识别（HGR）在仿生假肢和人机交互领域引起了越来越多的兴趣。然而，现有的多模态方法主要是在单一时间尺度上提取全局特异性，而忽略了局部动态特征。这种限制阻碍了对全局-局部时间信息的有效捕获，导致动态手势的性能受限和频繁的错误分类。为此，我们提出了一种新的全局-局部融合模型，称为时空依赖融合（TsdFusion），用于基于semg - acc的HGR。TsdFusion利用多时间尺度手工特征的时空依赖关系（Tsd），并采用卷积-变形框架进行全局-局部融合，从而在保留全局洞察力的同时丰富局部动态信息。具体而言，通过多时间尺度窗口分割和特征工程，从表面肌电信号和ACC中独立构建Tsd输入。此外，单峰Tsd输入中的全局和局部时空相关性分别由单峰变压器和维度卷积模块表征。随后，一个卷积耦合变压器递进分层融合模块有效地整合了模内特异性和多模间分层关系进行最终预测。对包括经桡骨截肢者和健康受试者在内的四个公共数据集的评估表明，TsdFusion优于最先进的多模态HGR方法。TsdFusion可以有效识别动态手势，为假肢或辅助机器人提供有前途的基于hgr的交互。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Global–Local Fusion Model Exploring Temporal–Spatial Dependence for Multimodal Hand Gesture Recognition

Hand Gesture Recognition (HGR) employing surface electromyography (sEMG) and accelerometer (ACC) signals has garnered increasing interest in areas of bionic prostheses and human-machine interaction. However, existing multimodal approaches predominantly extract global specificity at a single temporal scale, which neglects local dynamic characteristics. This limitation hinders the effective capture of global-local temporal information, resulting in restricted performance and frequent misclassification of dynamic gestures. To this end, we propose a novel global-local Fusion model, termed Temporal-spatial Dependence Fusion (TsdFusion), for sEMG-ACC-based HGR. TsdFusion harnesses temporal-spatial dependencies (Tsd) from multi-time scale handcrafted features and employs a Convolution-Transformer framework for global-local fusion, thus enriching local dynamic information while preserving global insights. Specifically, the Tsd inputs are independently constructed from sEMG and ACC through multi-time scale window segmentation and feature engineering. Furthermore, the global and local temporal-spatial correlations within unimodal Tsd inputs are characterized by the unimodal transformer and dimension-wise convolution modules, respectively. Subsequently, a Convolution-coupled-transformer progressive hierarchical fusion module effectively integrates intramodal specificity and intermodal hierarchical relationship for final prediction. Evaluations on four public datasets, including transradial amputees and healthy subjects, demonstrate TsdFusion outperforms the state-of-the-art multimodal HGR methods. The TsdFusion effectively recognizes dynamic gestures, facilitating promising HGR-based interaction for prostheses or assistance robotics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on medical robotics and bionics

CiteScore

6.80

自引率

0.00%

发文量