推进人机交互：利用卷积神经网络和文本到语音转换应用，实现人工智能驱动的美国手语到尼泊尔语的翻译

Systems and Soft Computing Pub Date : 2024-10-29 DOI:10.1016/j.sasc.2024.200165

Biplov Paneru , Bishwash Paneru , Khem Narayan Poudyal

{"title":"推进人机交互：利用卷积神经网络和文本到语音转换应用，实现人工智能驱动的美国手语到尼泊尔语的翻译","authors":"Biplov Paneru , Bishwash Paneru , Khem Narayan Poudyal","doi":"10.1016/j.sasc.2024.200165","DOIUrl":null,"url":null,"abstract":"<div><div>Advanced technology that serves people with impairments is severely lacking in Nepal, especially when it comes to helping the hearing impaired communicate. Although sign language is one of the oldest and most organic ways to communicate, there aren't many resources available in Nepal to help with the communication gap between Nepali and American Sign Language (ASL). This study investigates the application of Convolutional Neural Networks (CNN) and AI-driven methods for translating ASL into Nepali text and speech to bridge the technical divide. Two pre-trained transfer learning models, ResNet50 and VGG16, were refined to classify ASL signs using extensive ASL image datasets. The system utilizes the Python gTTS package to translate signs into Nepali text and speech, integrating with an OpenCV video input TKinter-based Graphical User Interface (GUI). With both CNN architectures, the model's accuracy of over 99 % allowed for the smooth conversion of ASL to speech output. By providing a workable solution to improve inclusion and communication, the deployment of an AI-driven translation system represents a significant step in lowering the technological obstacles that disabled people in Nepal must overcome.</div></div>","PeriodicalId":101205,"journal":{"name":"Systems and Soft Computing","volume":"6 ","pages":"Article 200165"},"PeriodicalIF":0.0000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advancing human-computer interaction: AI-driven translation of American Sign Language to Nepali using convolutional neural networks and text-to-speech conversion application\",\"authors\":\"Biplov Paneru , Bishwash Paneru , Khem Narayan Poudyal\",\"doi\":\"10.1016/j.sasc.2024.200165\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Advanced technology that serves people with impairments is severely lacking in Nepal, especially when it comes to helping the hearing impaired communicate. Although sign language is one of the oldest and most organic ways to communicate, there aren't many resources available in Nepal to help with the communication gap between Nepali and American Sign Language (ASL). This study investigates the application of Convolutional Neural Networks (CNN) and AI-driven methods for translating ASL into Nepali text and speech to bridge the technical divide. Two pre-trained transfer learning models, ResNet50 and VGG16, were refined to classify ASL signs using extensive ASL image datasets. The system utilizes the Python gTTS package to translate signs into Nepali text and speech, integrating with an OpenCV video input TKinter-based Graphical User Interface (GUI). With both CNN architectures, the model's accuracy of over 99 % allowed for the smooth conversion of ASL to speech output. By providing a workable solution to improve inclusion and communication, the deployment of an AI-driven translation system represents a significant step in lowering the technological obstacles that disabled people in Nepal must overcome.</div></div>\",\"PeriodicalId\":101205,\"journal\":{\"name\":\"Systems and Soft Computing\",\"volume\":\"6 \",\"pages\":\"Article 200165\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systems and Soft Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772941924000942\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772941924000942","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

尼泊尔严重缺乏为残障人士服务的先进技术，尤其是在帮助听障人士沟通方面。虽然手语是最古老、最有机的交流方式之一，但尼泊尔并没有太多可用的资源来帮助缩小尼泊尔语与美国手语（ASL）之间的交流差距。本研究调查了卷积神经网络（CNN）和人工智能驱动方法在将 ASL 翻译成尼泊尔语文本和语音方面的应用，以弥合技术鸿沟。研究人员利用广泛的 ASL 图像数据集，改进了两个预先训练好的迁移学习模型 ResNet50 和 VGG16，以对 ASL 符号进行分类。该系统利用 Python gTTS 软件包将手势翻译成尼泊尔语文本和语音，并与基于图形用户界面 (GUI) 的 OpenCV 视频输入 TKinter 集成。通过这两种 CNN 架构，该模型的准确率超过 99%，可将 ASL 顺利转换为语音输出。人工智能驱动翻译系统的部署提供了一个可行的解决方案来改善包容性和交流，在降低尼泊尔残疾人必须克服的技术障碍方面迈出了重要的一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Advancing human-computer interaction: AI-driven translation of American Sign Language to Nepali using convolutional neural networks and text-to-speech conversion application

Advanced technology that serves people with impairments is severely lacking in Nepal, especially when it comes to helping the hearing impaired communicate. Although sign language is one of the oldest and most organic ways to communicate, there aren't many resources available in Nepal to help with the communication gap between Nepali and American Sign Language (ASL). This study investigates the application of Convolutional Neural Networks (CNN) and AI-driven methods for translating ASL into Nepali text and speech to bridge the technical divide. Two pre-trained transfer learning models, ResNet50 and VGG16, were refined to classify ASL signs using extensive ASL image datasets. The system utilizes the Python gTTS package to translate signs into Nepali text and speech, integrating with an OpenCV video input TKinter-based Graphical User Interface (GUI). With both CNN architectures, the model's accuracy of over 99 % allowed for the smooth conversion of ASL to speech output. By providing a workable solution to improve inclusion and communication, the deployment of an AI-driven translation system represents a significant step in lowering the technological obstacles that disabled people in Nepal must overcome.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Systems and Soft Computing

CiteScore

2.20

自引率

0.00%

发文量