{"title":"基于绝对姿态估计差分和深度学习的手语检测与识别框架","authors":"Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida","doi":"10.1016/j.mlwa.2025.100723","DOIUrl":null,"url":null,"abstract":"<div><div>Computer vision has been identified as one of the key solutions for human activity recognition, including sign language recognition. Despite the success demonstrated by various studies, isolating signs from continuous video remains a challenge. The sliding window approach has been commonly used for translating continuous video. However, this method subjects the model to unnecessary predictions, leading to increased computational costs. This study proposes a framework that use absolute pose estimation differences to isolate signs from continuous videos and translate them using a model trained on isolated signs. Pose estimation features were chosen due to their proven effectiveness in various activity recognition tasks within computer vision. The proposed framework was evaluated on 10 videos of continuous signs. According to the findings, the framework achieved an average accuracy of 84%, while the model itself attained 95% accuracy. Moreover, SoftMax output analysis shows that the model exhibits higher confidence in correctly classified signs, as indicated by higher average SoftMax scores for correct predictions. This study demonstrates the potential of the proposed framework over the sliding window approach, which tends to overwhelm the model with excessive classification sequences.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"21 ","pages":"Article 100723"},"PeriodicalIF":4.9000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Framework for detecting and recognizing sign language using absolute pose estimation difference and deep learning\",\"authors\":\"Kasian Myagila , Devotha Godfrey Nyambo , Mussa Ally Dida\",\"doi\":\"10.1016/j.mlwa.2025.100723\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Computer vision has been identified as one of the key solutions for human activity recognition, including sign language recognition. Despite the success demonstrated by various studies, isolating signs from continuous video remains a challenge. The sliding window approach has been commonly used for translating continuous video. However, this method subjects the model to unnecessary predictions, leading to increased computational costs. This study proposes a framework that use absolute pose estimation differences to isolate signs from continuous videos and translate them using a model trained on isolated signs. Pose estimation features were chosen due to their proven effectiveness in various activity recognition tasks within computer vision. The proposed framework was evaluated on 10 videos of continuous signs. According to the findings, the framework achieved an average accuracy of 84%, while the model itself attained 95% accuracy. Moreover, SoftMax output analysis shows that the model exhibits higher confidence in correctly classified signs, as indicated by higher average SoftMax scores for correct predictions. This study demonstrates the potential of the proposed framework over the sliding window approach, which tends to overwhelm the model with excessive classification sequences.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"21 \",\"pages\":\"Article 100723\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025001069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025001069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Framework for detecting and recognizing sign language using absolute pose estimation difference and deep learning
Computer vision has been identified as one of the key solutions for human activity recognition, including sign language recognition. Despite the success demonstrated by various studies, isolating signs from continuous video remains a challenge. The sliding window approach has been commonly used for translating continuous video. However, this method subjects the model to unnecessary predictions, leading to increased computational costs. This study proposes a framework that use absolute pose estimation differences to isolate signs from continuous videos and translate them using a model trained on isolated signs. Pose estimation features were chosen due to their proven effectiveness in various activity recognition tasks within computer vision. The proposed framework was evaluated on 10 videos of continuous signs. According to the findings, the framework achieved an average accuracy of 84%, while the model itself attained 95% accuracy. Moreover, SoftMax output analysis shows that the model exhibits higher confidence in correctly classified signs, as indicated by higher average SoftMax scores for correct predictions. This study demonstrates the potential of the proposed framework over the sliding window approach, which tends to overwhelm the model with excessive classification sequences.