Framework for detecting and recognizing sign language using absolute pose estimation difference and deep learning

IF 4.9

Machine learning with applications Pub Date : 2025-08-11 DOI:10.1016/j.mlwa.2025.100723

Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida

{"title":"Framework for detecting and recognizing sign language using absolute pose estimation difference and deep learning","authors":"Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida","doi":"10.1016/j.mlwa.2025.100723","DOIUrl":null,"url":null,"abstract":"<div><div>Computer vision has been identified as one of the key solutions for human activity recognition, including sign language recognition. Despite the success demonstrated by various studies, isolating signs from continuous video remains a challenge. The sliding window approach has been commonly used for translating continuous video. However, this method subjects the model to unnecessary predictions, leading to increased computational costs. This study proposes a framework that use absolute pose estimation differences to isolate signs from continuous videos and translate them using a model trained on isolated signs. Pose estimation features were chosen due to their proven effectiveness in various activity recognition tasks within computer vision. The proposed framework was evaluated on 10 videos of continuous signs. According to the findings, the framework achieved an average accuracy of 84%, while the model itself attained 95% accuracy. Moreover, SoftMax output analysis shows that the model exhibits higher confidence in correctly classified signs, as indicated by higher average SoftMax scores for correct predictions. This study demonstrates the potential of the proposed framework over the sliding window approach, which tends to overwhelm the model with excessive classification sequences.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100723"},"PeriodicalIF":4.9000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Computer vision has been identified as one of the key solutions for human activity recognition, including sign language recognition. Despite the success demonstrated by various studies, isolating signs from continuous video remains a challenge. The sliding window approach has been commonly used for translating continuous video. However, this method subjects the model to unnecessary predictions, leading to increased computational costs. This study proposes a framework that use absolute pose estimation differences to isolate signs from continuous videos and translate them using a model trained on isolated signs. Pose estimation features were chosen due to their proven effectiveness in various activity recognition tasks within computer vision. The proposed framework was evaluated on 10 videos of continuous signs. According to the findings, the framework achieved an average accuracy of 84%, while the model itself attained 95% accuracy. Moreover, SoftMax output analysis shows that the model exhibits higher confidence in correctly classified signs, as indicated by higher average SoftMax scores for correct predictions. This study demonstrates the potential of the proposed framework over the sliding window approach, which tends to overwhelm the model with excessive classification sequences.

查看原文本刊更多论文

基于绝对姿态估计差分和深度学习的手语检测与识别框架

计算机视觉已被确定为人类活动识别的关键解决方案之一，包括手语识别。尽管各种研究都取得了成功，但从连续视频中分离信号仍然是一个挑战。滑动窗口方法已被广泛用于连续视频的翻译。然而，这种方法使模型受到不必要的预测，导致计算成本增加。本研究提出了一个框架，该框架使用绝对姿势估计差异从连续视频中分离出符号，并使用对孤立符号进行训练的模型对其进行翻译。选择姿态估计特征是因为它们在计算机视觉中的各种活动识别任务中被证明是有效的。在10个连续标志视频上对所提出的框架进行了评价。根据研究结果，该框架的平均准确率为84%，而模型本身的准确率为95%。此外，SoftMax输出分析表明，该模型对正确分类的符号表现出更高的置信度，正如正确预测的平均SoftMax得分较高所表明的那样。这项研究证明了所提出的框架相对于滑动窗口方法的潜力，滑动窗口方法倾向于用过多的分类序列压倒模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications

自引率

0.00%

发文量

审稿时长

98 days