MSE-GCN: A Multiscale Spatiotemporal Feature Aggregation Enhanced Efficient Graph Convolutional Network for Dynamic Sign Language Recognition

IF 5.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Neelma Naz;Hasan Sajid;Sara Ali;Osman Hasan;Muhammad Khurram Ehsan
{"title":"MSE-GCN: A Multiscale Spatiotemporal Feature Aggregation Enhanced Efficient Graph Convolutional Network for Dynamic Sign Language Recognition","authors":"Neelma Naz;Hasan Sajid;Sara Ali;Osman Hasan;Muhammad Khurram Ehsan","doi":"10.1109/TETCI.2024.3509500","DOIUrl":null,"url":null,"abstract":"Graph convolution networks have emerged as an active area of research for skeleton-based sign language recognition (SLR). One essential problem in this approach is to efficiently extract the most discriminative features capable of modeling short-range and long-range spatial and temporal information over all skeleton joints while ensuring low inference costs. To address this issue, we propose a novel multi-scale efficient graph convolutional network (MSE-GCN) for skeleton-based SLR. The proposed network makes use of separable convolution layers set in a multi-scale setting and embedded in a multi branch (MB) network along with an early fusion scheme, resulting in an accurate, computationally efficient, and faster system. In addition, we have proposed a novel hybrid attention module, named Spatial Temporal Joint Part attention (ST-JPA) to distinguish the most important body parts as well as most informative joints in the specific frames from the whole sign sequence. The performance of proposed network (MSE-GCN) is evaluated on five challenging sign language datasets, WLASL-100, WLASL-300, WLASL-1000, MINDS-Libras, and LIBRAS-UFOP achieving state-of-the-art (SOTA) accuracies of 85.27%, 81.59%, 71.75%, 97.442 ± 1.01%, and 88.59±3.60%, respectively while incurring lower computational costs.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"9 4","pages":"2979-2994"},"PeriodicalIF":5.3000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10799160/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Graph convolution networks have emerged as an active area of research for skeleton-based sign language recognition (SLR). One essential problem in this approach is to efficiently extract the most discriminative features capable of modeling short-range and long-range spatial and temporal information over all skeleton joints while ensuring low inference costs. To address this issue, we propose a novel multi-scale efficient graph convolutional network (MSE-GCN) for skeleton-based SLR. The proposed network makes use of separable convolution layers set in a multi-scale setting and embedded in a multi branch (MB) network along with an early fusion scheme, resulting in an accurate, computationally efficient, and faster system. In addition, we have proposed a novel hybrid attention module, named Spatial Temporal Joint Part attention (ST-JPA) to distinguish the most important body parts as well as most informative joints in the specific frames from the whole sign sequence. The performance of proposed network (MSE-GCN) is evaluated on five challenging sign language datasets, WLASL-100, WLASL-300, WLASL-1000, MINDS-Libras, and LIBRAS-UFOP achieving state-of-the-art (SOTA) accuracies of 85.27%, 81.59%, 71.75%, 97.442 ± 1.01%, and 88.59±3.60%, respectively while incurring lower computational costs.
MSE-GCN:一种用于动态手语识别的多尺度时空特征聚合增强高效图卷积网络
图卷积网络已成为基于骨架的手语识别(SLR)研究的一个活跃领域。该方法的一个关键问题是在保证低推理成本的同时,有效地提取出最具判别性的特征,能够对所有骨骼关节的近距离和远程时空信息进行建模。为了解决这个问题,我们提出了一种新的基于骨架的单反多尺度高效图卷积网络(MSE-GCN)。该网络利用在多尺度环境中设置的可分离卷积层,并与早期融合方案一起嵌入到多分支(MB)网络中,从而获得准确,计算效率高,速度更快的系统。此外,我们提出了一种新的混合注意模块,称为时空关节部分注意(ST-JPA),用于从整个符号序列中区分特定框架中最重要的身体部位和最具信息的关节。在5个具有挑战性的手语数据集WLASL-100、WLASL-300、WLASL-1000、MINDS-Libras和LIBRAS-UFOP上对本文提出的网络(MSE-GCN)的性能进行了评估,准确率分别为85.27%、81.59%、71.75%、97.442±1.01%和88.59±3.60%,同时计算成本较低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.30
自引率
7.50%
发文量
147
期刊介绍: The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信