{"title":"面向三维手势识别的双空间图卷积网络和变换图编码器","authors":"Rim Slama, W. Rabah, H. Wannous","doi":"10.1109/FG57933.2023.10042643","DOIUrl":null,"url":null,"abstract":"Skeleton-based hand gesture recognition is a challenging task that sparked a lot of attention in recent years, especially with the rise of Graph Neural Networks. In this paper, we propose a new deep learning architecture for hand gesture recognition using 3D hand skeleton data and we call STr-GCN. It decouples the spatial and temporal learning of the gesture by leveraging Graph Convolutional Networks (GCN) and Transformers. The key idea is to combine two powerful networks: a Spatial Graph Convolutional Network unit that understands intra-frame interactions to extract powerful features from different hand joints and a Transformer Graph Encoder which is based on a Temporal Self-Attention module to incorporate inter-frame correlations. We evaluate the performance of our method on three benchmarks: the SHREC'17 Track dataset, Briareo dataset and the First Person Hand Action dataset. The experiments show the efficiency of our approach, which achieves or outperforms the state of the art. The code to reproduce our results is available in this link.","PeriodicalId":318766,"journal":{"name":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"STr-GCN: Dual Spatial Graph Convolutional Network and Transformer Graph Encoder for 3D Hand Gesture Recognition\",\"authors\":\"Rim Slama, W. Rabah, H. Wannous\",\"doi\":\"10.1109/FG57933.2023.10042643\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Skeleton-based hand gesture recognition is a challenging task that sparked a lot of attention in recent years, especially with the rise of Graph Neural Networks. In this paper, we propose a new deep learning architecture for hand gesture recognition using 3D hand skeleton data and we call STr-GCN. It decouples the spatial and temporal learning of the gesture by leveraging Graph Convolutional Networks (GCN) and Transformers. The key idea is to combine two powerful networks: a Spatial Graph Convolutional Network unit that understands intra-frame interactions to extract powerful features from different hand joints and a Transformer Graph Encoder which is based on a Temporal Self-Attention module to incorporate inter-frame correlations. We evaluate the performance of our method on three benchmarks: the SHREC'17 Track dataset, Briareo dataset and the First Person Hand Action dataset. The experiments show the efficiency of our approach, which achieves or outperforms the state of the art. The code to reproduce our results is available in this link.\",\"PeriodicalId\":318766,\"journal\":{\"name\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FG57933.2023.10042643\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FG57933.2023.10042643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
STr-GCN: Dual Spatial Graph Convolutional Network and Transformer Graph Encoder for 3D Hand Gesture Recognition
Skeleton-based hand gesture recognition is a challenging task that sparked a lot of attention in recent years, especially with the rise of Graph Neural Networks. In this paper, we propose a new deep learning architecture for hand gesture recognition using 3D hand skeleton data and we call STr-GCN. It decouples the spatial and temporal learning of the gesture by leveraging Graph Convolutional Networks (GCN) and Transformers. The key idea is to combine two powerful networks: a Spatial Graph Convolutional Network unit that understands intra-frame interactions to extract powerful features from different hand joints and a Transformer Graph Encoder which is based on a Temporal Self-Attention module to incorporate inter-frame correlations. We evaluate the performance of our method on three benchmarks: the SHREC'17 Track dataset, Briareo dataset and the First Person Hand Action dataset. The experiments show the efficiency of our approach, which achieves or outperforms the state of the art. The code to reproduce our results is available in this link.