Gesture-to-Text Translation Using SURF for Indian Sign Language

IF 3.7 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Applied System Innovation Pub Date : 2023-03-02 DOI:10.3390/asi6020035

Kaustubh Mani Tripathi, P. Kamat, S. Patil, Ruchi Jayaswal, Swati Ahirrao, K. Kotecha

{"title":"Gesture-to-Text Translation Using SURF for Indian Sign Language","authors":"Kaustubh Mani Tripathi, P. Kamat, S. Patil, Ruchi Jayaswal, Swati Ahirrao, K. Kotecha","doi":"10.3390/asi6020035","DOIUrl":null,"url":null,"abstract":"This research paper focuses on developing an effective gesture-to-text translation system using state-of-the-art computer vision techniques. The existing research on sign language translation has yet to utilize skin masking, edge detection, and feature extraction techniques to their full potential. Therefore, this study employs the speeded-up robust features (SURF) model for feature extraction, which is resistant to variations such as rotation, perspective scaling, and occlusion. The proposed system utilizes a bag of visual words (BoVW) model for gesture-to-text conversion. The study uses a dataset of 42,000 photographs consisting of alphabets (A–Z) and numbers (1–9), divided into 35 classes with 1200 shots per class. The pre-processing phase includes skin masking, where the RGB color space is converted to the HSV color space, and Canny edge detection is used for sharp edge detection. The SURF elements are grouped and converted to a visual language using the K-means mini-batch clustering technique. The proposed system’s performance is evaluated using several machine learning algorithms such as naïve Bayes, logistic regression, K nearest neighbors, support vector machine, and convolutional neural network. All the algorithms benefited from SURF, and the system’s accuracy is promising, ranging from 79% to 92%. This research study not only presents the development of an effective gesture-to-text translation system but also highlights the importance of using skin masking, edge detection, and feature extraction techniques to their full potential in sign language translation. The proposed system aims to bridge the communication gap between individuals who cannot speak and those who cannot understand Indian Sign Language (ISL).","PeriodicalId":36273,"journal":{"name":"Applied System Innovation","volume":" ","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied System Innovation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/asi6020035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This research paper focuses on developing an effective gesture-to-text translation system using state-of-the-art computer vision techniques. The existing research on sign language translation has yet to utilize skin masking, edge detection, and feature extraction techniques to their full potential. Therefore, this study employs the speeded-up robust features (SURF) model for feature extraction, which is resistant to variations such as rotation, perspective scaling, and occlusion. The proposed system utilizes a bag of visual words (BoVW) model for gesture-to-text conversion. The study uses a dataset of 42,000 photographs consisting of alphabets (A–Z) and numbers (1–9), divided into 35 classes with 1200 shots per class. The pre-processing phase includes skin masking, where the RGB color space is converted to the HSV color space, and Canny edge detection is used for sharp edge detection. The SURF elements are grouped and converted to a visual language using the K-means mini-batch clustering technique. The proposed system’s performance is evaluated using several machine learning algorithms such as naïve Bayes, logistic regression, K nearest neighbors, support vector machine, and convolutional neural network. All the algorithms benefited from SURF, and the system’s accuracy is promising, ranging from 79% to 92%. This research study not only presents the development of an effective gesture-to-text translation system but also highlights the importance of using skin masking, edge detection, and feature extraction techniques to their full potential in sign language translation. The proposed system aims to bridge the communication gap between individuals who cannot speak and those who cannot understand Indian Sign Language (ISL).

查看原文本刊更多论文

用SURF进行手势到文本的翻译

本文的研究重点是利用最先进的计算机视觉技术开发一个有效的手势到文本的翻译系统。现有的手语翻译研究尚未充分利用皮肤掩蔽、边缘检测和特征提取技术。因此，本研究采用了加速鲁棒特征（SURF）模型进行特征提取，该模型能够抵抗旋转、透视缩放和遮挡等变化。所提出的系统利用视觉单词袋（BoVW）模型进行手势到文本的转换。该研究使用了一个由42000张照片组成的数据集，这些照片由字母（a-Z）和数字（1-9）组成，分为35个类别，每个类别1200张照片。预处理阶段包括皮肤掩蔽，其中RGB颜色空间被转换为HSV颜色空间，并且Canny边缘检测用于尖锐边缘检测。SURF元素被分组，并使用K-means小批量聚类技术转换为视觉语言。使用几种机器学习算法，如朴素贝叶斯、逻辑回归、K近邻、支持向量机和卷积神经网络，对所提出的系统的性能进行了评估。所有算法都受益于SURF，系统的准确率很有希望，从79%到92%不等。这项研究不仅介绍了一种有效的手势到文本翻译系统的开发，还强调了使用皮肤掩蔽、边缘检测和特征提取技术在手语翻译中充分发挥潜力的重要性。拟议的系统旨在弥合不会说话的人和不懂印度手语的人之间的沟通差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊