Machine Translation of English Videos to Indian Regional Languages using Open Innovation

2019 IEEE International Symposium on Technology and Society (ISTAS) Pub Date : 2019-11-01 DOI:10.1109/istas48451.2019.8937988

Srikar Kashyap Pulipaka, Chaitanya Krishna Kasaraneni, Venkata Naga Sandeep Vemulapalli, Surya Sai Mourya Kosaraju

{"title":"Machine Translation of English Videos to Indian Regional Languages using Open Innovation","authors":"Srikar Kashyap Pulipaka, Chaitanya Krishna Kasaraneni, Venkata Naga Sandeep Vemulapalli, Surya Sai Mourya Kosaraju","doi":"10.1109/istas48451.2019.8937988","DOIUrl":null,"url":null,"abstract":"In spite of many languages being spoken in India, it is difficult for the people to understand foreign languages like English, Spanish, Italian, etc. The recognition and synthesis of speech are prominent emerging technologies in natural language processing and communication domains. This paper aims to leverage the open source applications of these technologies, machine translation, text-to-speech system (TTS), and speech-to-text system (STT) to convert available online resources to Indian languages. This application takes an English language video as an input and separates the audio from video. It then divides the audio file into several smaller chunks based on the timestamps. These audio chunks are then individually converted into text using IBM Watson's speech-to-text (STT) module. The obtained text chunks are then concatenated and passed to Google's machine translate API for conversion to the requested Indian language. After this translation, a TTS system is required to convert the text into the desired audio output. Not many open source TTS systems are available for Indian regional languages. One such available application is the flite engine (a lighter version of Festival engine developed by Prof. Alan Black at Carnegie Mellon University (CMU)). This flite engine is used as TTS for generating audio from translated text. The accuracy of the application developed can be as high as 91 percent for a single video and averages about 79 percent. This accuracy is verified by comparing naturality of the audio with the general spoken language. This application is beneficial to visually impaired people as well as individuals who are not capable of reading text to acquire knowledge in their native language. In future, this application aims to achieve ubiquitous communication enabling people of different regions to communicate with each other breaking the language barriers.","PeriodicalId":201396,"journal":{"name":"2019 IEEE International Symposium on Technology and Society (ISTAS)","volume":"301 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Symposium on Technology and Society (ISTAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/istas48451.2019.8937988","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In spite of many languages being spoken in India, it is difficult for the people to understand foreign languages like English, Spanish, Italian, etc. The recognition and synthesis of speech are prominent emerging technologies in natural language processing and communication domains. This paper aims to leverage the open source applications of these technologies, machine translation, text-to-speech system (TTS), and speech-to-text system (STT) to convert available online resources to Indian languages. This application takes an English language video as an input and separates the audio from video. It then divides the audio file into several smaller chunks based on the timestamps. These audio chunks are then individually converted into text using IBM Watson's speech-to-text (STT) module. The obtained text chunks are then concatenated and passed to Google's machine translate API for conversion to the requested Indian language. After this translation, a TTS system is required to convert the text into the desired audio output. Not many open source TTS systems are available for Indian regional languages. One such available application is the flite engine (a lighter version of Festival engine developed by Prof. Alan Black at Carnegie Mellon University (CMU)). This flite engine is used as TTS for generating audio from translated text. The accuracy of the application developed can be as high as 91 percent for a single video and averages about 79 percent. This accuracy is verified by comparing naturality of the audio with the general spoken language. This application is beneficial to visually impaired people as well as individuals who are not capable of reading text to acquire knowledge in their native language. In future, this application aims to achieve ubiquitous communication enabling people of different regions to communicate with each other breaking the language barriers.

查看原文本刊更多论文

基于开放式创新的英语视频到印度地区语言的机器翻译

尽管印度人说很多语言，但人们很难理解英语、西班牙语、意大利语等外语。语音识别和语音合成是自然语言处理和通信领域中突出的新兴技术。本文旨在利用这些技术的开源应用程序，机器翻译，文本到语音系统(TTS)和语音到文本系统(STT)，将可用的在线资源转换为印度语言。该应用程序将英语视频作为输入，并将音频与视频分开。然后，它根据时间戳将音频文件分成几个较小的块。然后使用IBM Watson的语音到文本(STT)模块将这些音频块单独转换为文本。然后将获得的文本块连接起来并传递给Google的机器翻译API，以转换为所请求的印度语言。翻译完成后，需要TTS系统将文本转换成所需的音频输出。没有多少开源TTS系统可用于印度地区语言。其中一个可用的应用程序是flite引擎(由卡内基梅隆大学(CMU)的Alan Black教授开发的Festival引擎的轻量级版本)。这个flite引擎被用作从翻译文本生成音频的TTS。对于单个视频，开发的应用程序的准确率可高达91%，平均约为79%。通过将音频的自然程度与一般口语进行比较，可以验证这种准确性。这个应用程序是有益的视障人士，以及个人谁不能阅读文本，以获取知识的母语。未来，这个应用程序的目标是实现无处不在的交流，使不同地区的人们能够打破语言障碍进行交流。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Symposium on Technology and Society (ISTAS)

自引率

0.00%

发文量