{"title":"ST-CGNet:基于三重注意和双特征融合的时空手势识别网络","authors":"Jing Hu, Songtao Liu, Mingzhou Liu, Tingyu Zhou, Jiale Lu, Xingyan Zuo","doi":"10.1016/j.patcog.2025.111767","DOIUrl":null,"url":null,"abstract":"<div><div>Gesture recognition, as a critical area in human–computer interaction, faces significant challenges in modeling complex spatiotemporal dynamics and adapting to gesture diversity. This paper proposes a novel framework—ST-CGNet, which captures multi-scale spatiotemporal features by integrating an optimized C3D network with a lightweight GatedConvLSTM. The C3D module focuses on short-term spatiotemporal feature extraction, while the GatedConvLSTM captures long-term dependencies through a gating mechanism. To enhance sensitivity to dynamic variations in gestures, a TripletAttention3D module is introduced, which strengthens the model’s ability to focus on salient motion patterns. Additionally, an adaptive fusion strategy is employed to dynamically weight and integrate features from both branches, improving performance across diverse gesture types. Experiments on the Jester and EgoGesture datasets demonstrate that the proposed method significantly outperforms baseline models in terms of recognition accuracy and generalization, particularly in handling complex gesture sequences. These results highlight the effectiveness of the proposed approach as a promising solution for dynamic gesture recognition.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111767"},"PeriodicalIF":7.5000,"publicationDate":"2025-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"ST-CGNet: A spatiotemporal gesture recognition network with triplet attention and dual feature fusion\",\"authors\":\"Jing Hu, Songtao Liu, Mingzhou Liu, Tingyu Zhou, Jiale Lu, Xingyan Zuo\",\"doi\":\"10.1016/j.patcog.2025.111767\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Gesture recognition, as a critical area in human–computer interaction, faces significant challenges in modeling complex spatiotemporal dynamics and adapting to gesture diversity. This paper proposes a novel framework—ST-CGNet, which captures multi-scale spatiotemporal features by integrating an optimized C3D network with a lightweight GatedConvLSTM. The C3D module focuses on short-term spatiotemporal feature extraction, while the GatedConvLSTM captures long-term dependencies through a gating mechanism. To enhance sensitivity to dynamic variations in gestures, a TripletAttention3D module is introduced, which strengthens the model’s ability to focus on salient motion patterns. Additionally, an adaptive fusion strategy is employed to dynamically weight and integrate features from both branches, improving performance across diverse gesture types. Experiments on the Jester and EgoGesture datasets demonstrate that the proposed method significantly outperforms baseline models in terms of recognition accuracy and generalization, particularly in handling complex gesture sequences. These results highlight the effectiveness of the proposed approach as a promising solution for dynamic gesture recognition.</div></div>\",\"PeriodicalId\":49713,\"journal\":{\"name\":\"Pattern Recognition\",\"volume\":\"167 \",\"pages\":\"Article 111767\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-05-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0031320325004273\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320325004273","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
ST-CGNet: A spatiotemporal gesture recognition network with triplet attention and dual feature fusion
Gesture recognition, as a critical area in human–computer interaction, faces significant challenges in modeling complex spatiotemporal dynamics and adapting to gesture diversity. This paper proposes a novel framework—ST-CGNet, which captures multi-scale spatiotemporal features by integrating an optimized C3D network with a lightweight GatedConvLSTM. The C3D module focuses on short-term spatiotemporal feature extraction, while the GatedConvLSTM captures long-term dependencies through a gating mechanism. To enhance sensitivity to dynamic variations in gestures, a TripletAttention3D module is introduced, which strengthens the model’s ability to focus on salient motion patterns. Additionally, an adaptive fusion strategy is employed to dynamically weight and integrate features from both branches, improving performance across diverse gesture types. Experiments on the Jester and EgoGesture datasets demonstrate that the proposed method significantly outperforms baseline models in terms of recognition accuracy and generalization, particularly in handling complex gesture sequences. These results highlight the effectiveness of the proposed approach as a promising solution for dynamic gesture recognition.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.