{"title":"Gesture Recognition with Focuses Using Hierarchical Body Part Combination","authors":"Cheng Zhang;Yibin Hou;Jian He;Xiaoyang Xie","doi":"10.26599/TST.2024.9010059","DOIUrl":null,"url":null,"abstract":"Human gesture recognition is an important research field of human-computer interaction due to its potential applications in various fields, but existing methods still face challenges in achieving high levels of accuracy. To address this issue, some existing researches propose to fuse the global features with the cropped features called focuses on vital body parts like hands. However, most methods rely on experience when choosing the focus, the scheme of focus selection is not discussed in detail. In this paper, a hierarchical body part combination method is proposed to take into account the number, combinations, and logical relationships between body parts. The proposed method generates multiple focuses using this method and employs chart-based surface modality alongside red-green-blue and optical flow modalities to enhance each focus. A feature-level fusion scheme based on the residual connection structure is proposed to fuse different modalities at convolution stages, and a focus fusion scheme is proposed to learn the relevancy of focus channels for each gesture class individually. Experiments conducted on ChaLearn isolated gesture dataset show that the use of multiple focuses in conjunction with multi-modal features and fusion strategies leads to better gesture recognition accuracy.","PeriodicalId":48690,"journal":{"name":"Tsinghua Science and Technology","volume":"30 4","pages":"1583-1599"},"PeriodicalIF":6.6000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10908593","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Tsinghua Science and Technology","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10908593/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Multidisciplinary","Score":null,"Total":0}
引用次数: 0
Abstract
Human gesture recognition is an important research field of human-computer interaction due to its potential applications in various fields, but existing methods still face challenges in achieving high levels of accuracy. To address this issue, some existing researches propose to fuse the global features with the cropped features called focuses on vital body parts like hands. However, most methods rely on experience when choosing the focus, the scheme of focus selection is not discussed in detail. In this paper, a hierarchical body part combination method is proposed to take into account the number, combinations, and logical relationships between body parts. The proposed method generates multiple focuses using this method and employs chart-based surface modality alongside red-green-blue and optical flow modalities to enhance each focus. A feature-level fusion scheme based on the residual connection structure is proposed to fuse different modalities at convolution stages, and a focus fusion scheme is proposed to learn the relevancy of focus channels for each gesture class individually. Experiments conducted on ChaLearn isolated gesture dataset show that the use of multiple focuses in conjunction with multi-modal features and fusion strategies leads to better gesture recognition accuracy.
期刊介绍:
Tsinghua Science and Technology (Tsinghua Sci Technol) started publication in 1996. It is an international academic journal sponsored by Tsinghua University and is published bimonthly. This journal aims at presenting the up-to-date scientific achievements in computer science, electronic engineering, and other IT fields. Contributions all over the world are welcome.