U. Gawande, Nutan Rathod, Pooja Bodkhe, Pradnya Kolhe, Hema Amlani, Chetana B. Thaokar
{"title":"Novel Machine Learning based Text-To-Speech Device for Visually Impaired People","authors":"U. Gawande, Nutan Rathod, Pooja Bodkhe, Pradnya Kolhe, Hema Amlani, Chetana B. Thaokar","doi":"10.1109/ICSTSN57873.2023.10151637","DOIUrl":null,"url":null,"abstract":"The images of documents contain the text and non-textual characters which are usually converted into voice or audio format using TTS (Text to Speech) systems. The blind individuals are supposed to benefit from this TTS innovation. According to the World Health Organization, India has 1.99% blind individuals. Hence, aiding the blind is necessary. In the proposed research, a machine learning based text-to-speech converter is proposed. First, a Raspberry Pi is embedded with proposed text to speech converter algorithm. Second, a camera is used to capture the images as input, which the TTS unit will receive. Third, a Raspberry Pi is equipped with a TTS unit, that proposed a captured image and the output of the TTS device is amplified using an audio amplifier. Fourth, the proposed signal is sent to the speaker. A reader enables the user to hear the text they have entered. It entails text extraction from the image and text-to-speech conversion. With a camera module and a Raspberry Pi, the OCR (optical character recognition) method is used to convert text to speech. The setup consists of a Raspberry Pi webcam interface. The audio output on the Raspberry Pi can be heard through speakers or headphones. A few seconds pass during the conversion. This device can make it easier for people who are visually challenged to read text from images. Experimental results show the significant improvement compare to the state-of-the-art text to speech conversion algorithm. Paper ended with future research direction.","PeriodicalId":325019,"journal":{"name":"2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 2nd International Conference on Smart Technologies and Systems for Next Generation Computing (ICSTSN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTSN57873.2023.10151637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The images of documents contain the text and non-textual characters which are usually converted into voice or audio format using TTS (Text to Speech) systems. The blind individuals are supposed to benefit from this TTS innovation. According to the World Health Organization, India has 1.99% blind individuals. Hence, aiding the blind is necessary. In the proposed research, a machine learning based text-to-speech converter is proposed. First, a Raspberry Pi is embedded with proposed text to speech converter algorithm. Second, a camera is used to capture the images as input, which the TTS unit will receive. Third, a Raspberry Pi is equipped with a TTS unit, that proposed a captured image and the output of the TTS device is amplified using an audio amplifier. Fourth, the proposed signal is sent to the speaker. A reader enables the user to hear the text they have entered. It entails text extraction from the image and text-to-speech conversion. With a camera module and a Raspberry Pi, the OCR (optical character recognition) method is used to convert text to speech. The setup consists of a Raspberry Pi webcam interface. The audio output on the Raspberry Pi can be heard through speakers or headphones. A few seconds pass during the conversion. This device can make it easier for people who are visually challenged to read text from images. Experimental results show the significant improvement compare to the state-of-the-art text to speech conversion algorithm. Paper ended with future research direction.