Conversion of Sign Language To Text And Speech Using Machine Learning Techniques

JOURNAL OF RESEARCH AND REVIEW IN SCIENCE Pub Date : 2018-12-01 DOI:10.36108/jrrslasu/8102/50(0170)

Victoria Adewale, A. Olamiti

{"title":"Conversion of Sign Language To Text And Speech Using Machine Learning Techniques","authors":"Victoria Adewale, A. Olamiti","doi":"10.36108/jrrslasu/8102/50(0170)","DOIUrl":null,"url":null,"abstract":"Introduction: Communication with the hearing impaired ( deaf/mute) people is a great challenge in our society today; this can be attributed to the fact that their means of communication (Sign Language or hand gestures at a local level) requires an interpreter at every instance. Conversion of images to text as well as speech can be of great benefit to the non-hearing impaired and hearing impaired people (the deaf/mute) from circadian interaction with images. To effectively achieve this, a sign language (ASL – American Sign Language) image to text as well as speech conversion was aimed at in this research. Methodology: The techniques of image segmentation and feature detection played a crucial role in implementing this system. We formulate the interaction between image segmentation and object recognition in the framework of FAST and SURF algorithms. The system goes through various phases such as data capturing using KINECT sensor, image segmentation, feature detection and extraction from ROI, supervised and unsupervised classification of images with K-Nearest Neighbour (KNN)-algorithms and text-to-speech (TTS) conversion. The combination FAST and SURF with a KNN of 10 also showed that unsupervised learning classification could determine the best matched feature from the existing database. In turn, the best match was converted to text as well as speech. Result: The introduced system achieved a 78% accuracy of unsupervised feature learning. Conclusion: The success of this work can be attributed to the effective classification that has improved the unsupervised feature learning of different images. The pre-determination of the ROI of each image using SURF and FAST, has demonstrated the ability of the proposed algorithm to limit image modelling to relevant region within the image.","PeriodicalId":16955,"journal":{"name":"JOURNAL OF RESEARCH AND REVIEW IN SCIENCE","volume":"10 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JOURNAL OF RESEARCH AND REVIEW IN SCIENCE","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.36108/jrrslasu/8102/50(0170)","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Introduction: Communication with the hearing impaired ( deaf/mute) people is a great challenge in our society today; this can be attributed to the fact that their means of communication (Sign Language or hand gestures at a local level) requires an interpreter at every instance. Conversion of images to text as well as speech can be of great benefit to the non-hearing impaired and hearing impaired people (the deaf/mute) from circadian interaction with images. To effectively achieve this, a sign language (ASL – American Sign Language) image to text as well as speech conversion was aimed at in this research. Methodology: The techniques of image segmentation and feature detection played a crucial role in implementing this system. We formulate the interaction between image segmentation and object recognition in the framework of FAST and SURF algorithms. The system goes through various phases such as data capturing using KINECT sensor, image segmentation, feature detection and extraction from ROI, supervised and unsupervised classification of images with K-Nearest Neighbour (KNN)-algorithms and text-to-speech (TTS) conversion. The combination FAST and SURF with a KNN of 10 also showed that unsupervised learning classification could determine the best matched feature from the existing database. In turn, the best match was converted to text as well as speech. Result: The introduced system achieved a 78% accuracy of unsupervised feature learning. Conclusion: The success of this work can be attributed to the effective classification that has improved the unsupervised feature learning of different images. The pre-determination of the ROI of each image using SURF and FAST, has demonstrated the ability of the proposed algorithm to limit image modelling to relevant region within the image.

查看原文本刊更多论文

使用机器学习技术将手语转换为文本和语音

在当今社会，与听障人士(聋哑人)沟通是一个巨大的挑战;这可能是由于他们的交流手段(手语或当地的手势)在任何情况下都需要翻译。将图像转换为文本和语音对非听力受损者和听力受损者(聋哑人)从与图像的昼夜互动中受益匪浅。为了有效地实现这一目标，本研究针对手语(ASL - American sign language)图像到文本以及语音的转换进行了研究。方法:图像分割和特征检测技术在该系统的实现中起着至关重要的作用。我们在FAST和SURF算法的框架下制定了图像分割和目标识别之间的交互。该系统经历了多个阶段，如使用KINECT传感器捕获数据，图像分割，特征检测和从ROI提取，使用k -最近邻(KNN)算法对图像进行监督和非监督分类以及文本到语音(TTS)转换。在KNN为10的情况下，FAST和SURF的组合也表明，无监督学习分类可以从现有数据库中确定最匹配的特征。然后，最佳匹配被转换为文本和语音。结果:该系统的无监督特征学习准确率达到78%。结论:本工作的成功可以归因于有效的分类，提高了不同图像的无监督特征学习。使用SURF和FAST预先确定每张图像的ROI，证明了该算法将图像建模限制在图像内相关区域的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

JOURNAL OF RESEARCH AND REVIEW IN SCIENCE

自引率

0.00%

发文量