Framework for detecting and recognizing sign language using absolute pose estimation difference and deep learning

IF 4.9
Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida
{"title":"Framework for detecting and recognizing sign language using absolute pose estimation difference and deep learning","authors":"Kasian Myagila ,&nbsp;Devotha Godfrey Nyambo ,&nbsp;Mussa Ally Dida","doi":"10.1016/j.mlwa.2025.100723","DOIUrl":null,"url":null,"abstract":"<div><div>Computer vision has been identified as one of the key solutions for human activity recognition, including sign language recognition. Despite the success demonstrated by various studies, isolating signs from continuous video remains a challenge. The sliding window approach has been commonly used for translating continuous video. However, this method subjects the model to unnecessary predictions, leading to increased computational costs. This study proposes a framework that use absolute pose estimation differences to isolate signs from continuous videos and translate them using a model trained on isolated signs. Pose estimation features were chosen due to their proven effectiveness in various activity recognition tasks within computer vision. The proposed framework was evaluated on 10 videos of continuous signs. According to the findings, the framework achieved an average accuracy of 84%, while the model itself attained 95% accuracy. Moreover, SoftMax output analysis shows that the model exhibits higher confidence in correctly classified signs, as indicated by higher average SoftMax scores for correct predictions. This study demonstrates the potential of the proposed framework over the sliding window approach, which tends to overwhelm the model with excessive classification sequences.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100723"},"PeriodicalIF":4.9000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Computer vision has been identified as one of the key solutions for human activity recognition, including sign language recognition. Despite the success demonstrated by various studies, isolating signs from continuous video remains a challenge. The sliding window approach has been commonly used for translating continuous video. However, this method subjects the model to unnecessary predictions, leading to increased computational costs. This study proposes a framework that use absolute pose estimation differences to isolate signs from continuous videos and translate them using a model trained on isolated signs. Pose estimation features were chosen due to their proven effectiveness in various activity recognition tasks within computer vision. The proposed framework was evaluated on 10 videos of continuous signs. According to the findings, the framework achieved an average accuracy of 84%, while the model itself attained 95% accuracy. Moreover, SoftMax output analysis shows that the model exhibits higher confidence in correctly classified signs, as indicated by higher average SoftMax scores for correct predictions. This study demonstrates the potential of the proposed framework over the sliding window approach, which tends to overwhelm the model with excessive classification sequences.
基于绝对姿态估计差分和深度学习的手语检测与识别框架
计算机视觉已被确定为人类活动识别的关键解决方案之一,包括手语识别。尽管各种研究都取得了成功,但从连续视频中分离信号仍然是一个挑战。滑动窗口方法已被广泛用于连续视频的翻译。然而,这种方法使模型受到不必要的预测,导致计算成本增加。本研究提出了一个框架,该框架使用绝对姿势估计差异从连续视频中分离出符号,并使用对孤立符号进行训练的模型对其进行翻译。选择姿态估计特征是因为它们在计算机视觉中的各种活动识别任务中被证明是有效的。在10个连续标志视频上对所提出的框架进行了评价。根据研究结果,该框架的平均准确率为84%,而模型本身的准确率为95%。此外,SoftMax输出分析表明,该模型对正确分类的符号表现出更高的置信度,正如正确预测的平均SoftMax得分较高所表明的那样。这项研究证明了所提出的框架相对于滑动窗口方法的潜力,滑动窗口方法倾向于用过多的分类序列压倒模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信