{"title":"一种探索多模态手势识别时空相关性的全局局部融合模型","authors":"Shengcai Duan;Le Wu;Aiping Liu;Xun Chen","doi":"10.1109/TMRB.2025.3550646","DOIUrl":null,"url":null,"abstract":"Hand Gesture Recognition (HGR) employing surface electromyography (sEMG) and accelerometer (ACC) signals has garnered increasing interest in areas of bionic prostheses and human-machine interaction. However, existing multimodal approaches predominantly extract global specificity at a single temporal scale, which neglects local dynamic characteristics. This limitation hinders the effective capture of global-local temporal information, resulting in restricted performance and frequent misclassification of dynamic gestures. To this end, we propose a novel global-local Fusion model, termed Temporal-spatial Dependence Fusion (TsdFusion), for sEMG-ACC-based HGR. TsdFusion harnesses temporal-spatial dependencies (Tsd) from multi-time scale handcrafted features and employs a Convolution-Transformer framework for global-local fusion, thus enriching local dynamic information while preserving global insights. Specifically, the Tsd inputs are independently constructed from sEMG and ACC through multi-time scale window segmentation and feature engineering. Furthermore, the global and local temporal-spatial correlations within unimodal Tsd inputs are characterized by the unimodal transformer and dimension-wise convolution modules, respectively. Subsequently, a Convolution-coupled-transformer progressive hierarchical fusion module effectively integrates intramodal specificity and intermodal hierarchical relationship for final prediction. Evaluations on four public datasets, including transradial amputees and healthy subjects, demonstrate TsdFusion outperforms the state-of-the-art multimodal HGR methods. The TsdFusion effectively recognizes dynamic gestures, facilitating promising HGR-based interaction for prostheses or assistance robotics.","PeriodicalId":73318,"journal":{"name":"IEEE transactions on medical robotics and bionics","volume":"7 2","pages":"723-733"},"PeriodicalIF":3.8000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Global–Local Fusion Model Exploring Temporal–Spatial Dependence for Multimodal Hand Gesture Recognition\",\"authors\":\"Shengcai Duan;Le Wu;Aiping Liu;Xun Chen\",\"doi\":\"10.1109/TMRB.2025.3550646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hand Gesture Recognition (HGR) employing surface electromyography (sEMG) and accelerometer (ACC) signals has garnered increasing interest in areas of bionic prostheses and human-machine interaction. However, existing multimodal approaches predominantly extract global specificity at a single temporal scale, which neglects local dynamic characteristics. This limitation hinders the effective capture of global-local temporal information, resulting in restricted performance and frequent misclassification of dynamic gestures. To this end, we propose a novel global-local Fusion model, termed Temporal-spatial Dependence Fusion (TsdFusion), for sEMG-ACC-based HGR. TsdFusion harnesses temporal-spatial dependencies (Tsd) from multi-time scale handcrafted features and employs a Convolution-Transformer framework for global-local fusion, thus enriching local dynamic information while preserving global insights. Specifically, the Tsd inputs are independently constructed from sEMG and ACC through multi-time scale window segmentation and feature engineering. Furthermore, the global and local temporal-spatial correlations within unimodal Tsd inputs are characterized by the unimodal transformer and dimension-wise convolution modules, respectively. Subsequently, a Convolution-coupled-transformer progressive hierarchical fusion module effectively integrates intramodal specificity and intermodal hierarchical relationship for final prediction. Evaluations on four public datasets, including transradial amputees and healthy subjects, demonstrate TsdFusion outperforms the state-of-the-art multimodal HGR methods. The TsdFusion effectively recognizes dynamic gestures, facilitating promising HGR-based interaction for prostheses or assistance robotics.\",\"PeriodicalId\":73318,\"journal\":{\"name\":\"IEEE transactions on medical robotics and bionics\",\"volume\":\"7 2\",\"pages\":\"723-733\"},\"PeriodicalIF\":3.8000,\"publicationDate\":\"2025-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on medical robotics and bionics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10924260/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, BIOMEDICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on medical robotics and bionics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10924260/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}
A Global–Local Fusion Model Exploring Temporal–Spatial Dependence for Multimodal Hand Gesture Recognition
Hand Gesture Recognition (HGR) employing surface electromyography (sEMG) and accelerometer (ACC) signals has garnered increasing interest in areas of bionic prostheses and human-machine interaction. However, existing multimodal approaches predominantly extract global specificity at a single temporal scale, which neglects local dynamic characteristics. This limitation hinders the effective capture of global-local temporal information, resulting in restricted performance and frequent misclassification of dynamic gestures. To this end, we propose a novel global-local Fusion model, termed Temporal-spatial Dependence Fusion (TsdFusion), for sEMG-ACC-based HGR. TsdFusion harnesses temporal-spatial dependencies (Tsd) from multi-time scale handcrafted features and employs a Convolution-Transformer framework for global-local fusion, thus enriching local dynamic information while preserving global insights. Specifically, the Tsd inputs are independently constructed from sEMG and ACC through multi-time scale window segmentation and feature engineering. Furthermore, the global and local temporal-spatial correlations within unimodal Tsd inputs are characterized by the unimodal transformer and dimension-wise convolution modules, respectively. Subsequently, a Convolution-coupled-transformer progressive hierarchical fusion module effectively integrates intramodal specificity and intermodal hierarchical relationship for final prediction. Evaluations on four public datasets, including transradial amputees and healthy subjects, demonstrate TsdFusion outperforms the state-of-the-art multimodal HGR methods. The TsdFusion effectively recognizes dynamic gestures, facilitating promising HGR-based interaction for prostheses or assistance robotics.