Speech Recorder and Translator using Google Cloud Speech-to-Text and Translation

Journal of IT in Asia Pub Date : 2021-11-30 DOI:10.33736/jita.2815.2021

Hui Hui Wang

{"title":"Speech Recorder and Translator using Google Cloud Speech-to-Text and Translation","authors":"Hui Hui Wang","doi":"10.33736/jita.2815.2021","DOIUrl":null,"url":null,"abstract":"The most popular video website YouTube has about 2 billion users worldwide who speak and understand different languages. Subtitles are essential for the users to get the message from the video. However, not all video owners provide subtitles for their videos. It causes the potential audiences to have difficulties in understanding the video content. Thus, this study proposed a speech recorder and translator to solve this problem. The general concept of this study was to combine Automatic Speech Recognition (ASR) and translation technologies to recognize the video content and translate it into other languages. This paper compared and discussed three different ASR technologies. They are Google Cloud Speech-to-Text, Limecraft Transcriber, and VoxSigma. Finally, the proposed system used Google Cloud Speech-to-Text because it supports more languages than Limecraft Transcriber and VoxSigma. Besides, it was more flexible to use with Google Cloud Translation. This paper also consisted of a questionnaire about the crucial features of the speech recorder and translator. There was a total of 19 university students participated in the questionnaire. Most of the respondents stated that high translation accuracy is vital for the proposed system. This paper also discussed a related work of speech recorder and translator. It was a study that compared speech recognition between ordinary voice and speech impaired voice. It used a mobile application to record acoustic voice input. Compared to the existing mobile App, this project proposed a web application. It was a different and new study, especially in terms of development and user experience. Finally, this project developed the proposed system successfully. The results showed that Google Cloud Speech-to-Text and Translation were reliable to use in video translation. However, it could not recognize the speech when the background music was too loud. Besides, it had a problem of direct translation, which was challenging. Thus, future research may need a custom trained model. In conclusion, the proposed system in this project was to contribute a new idea of a web application to solve the language barrier on the video watching platform.","PeriodicalId":152019,"journal":{"name":"Journal of IT in Asia","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of IT in Asia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33736/jita.2815.2021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The most popular video website YouTube has about 2 billion users worldwide who speak and understand different languages. Subtitles are essential for the users to get the message from the video. However, not all video owners provide subtitles for their videos. It causes the potential audiences to have difficulties in understanding the video content. Thus, this study proposed a speech recorder and translator to solve this problem. The general concept of this study was to combine Automatic Speech Recognition (ASR) and translation technologies to recognize the video content and translate it into other languages. This paper compared and discussed three different ASR technologies. They are Google Cloud Speech-to-Text, Limecraft Transcriber, and VoxSigma. Finally, the proposed system used Google Cloud Speech-to-Text because it supports more languages than Limecraft Transcriber and VoxSigma. Besides, it was more flexible to use with Google Cloud Translation. This paper also consisted of a questionnaire about the crucial features of the speech recorder and translator. There was a total of 19 university students participated in the questionnaire. Most of the respondents stated that high translation accuracy is vital for the proposed system. This paper also discussed a related work of speech recorder and translator. It was a study that compared speech recognition between ordinary voice and speech impaired voice. It used a mobile application to record acoustic voice input. Compared to the existing mobile App, this project proposed a web application. It was a different and new study, especially in terms of development and user experience. Finally, this project developed the proposed system successfully. The results showed that Google Cloud Speech-to-Text and Translation were reliable to use in video translation. However, it could not recognize the speech when the background music was too loud. Besides, it had a problem of direct translation, which was challenging. Thus, future research may need a custom trained model. In conclusion, the proposed system in this project was to contribute a new idea of a web application to solve the language barrier on the video watching platform.

查看原文本刊更多论文

语音记录器和翻译使用谷歌云语音到文本和翻译

最受欢迎的视频网站YouTube在全球拥有大约20亿用户，他们会说并理解不同的语言。字幕对于用户从视频中获取信息是必不可少的。然而，并不是所有的视频所有者都为他们的视频提供字幕。它导致潜在的观众在理解视频内容方面有困难。因此，本研究提出了一种录音翻译器来解决这一问题。本研究的总体思路是将自动语音识别(Automatic Speech Recognition, ASR)和翻译技术相结合，对视频内容进行识别并翻译成其他语言。本文对三种不同的ASR技术进行了比较和讨论。它们是Google Cloud Speech-to-Text, Limecraft transcribe和VoxSigma。最后，提议的系统使用谷歌云语音到文本，因为它比Limecraft转录器和VoxSigma支持更多的语言。此外，它与谷歌云翻译一起使用更加灵活。本文还对录音和翻译器的关键特性进行了问卷调查。共有19名大学生参与了问卷调查。大多数受访者表示，高翻译精度对拟议的系统至关重要。本文还讨论了语音记录和翻译的相关工作。这是一项比较普通声音和语言障碍声音的语音识别的研究。它使用了一个移动应用程序来记录声音输入。与现有的移动App相比，本项目提出了一个web应用。这是一项不同的新研究，特别是在开发和用户体验方面。最后，本课题成功开发了该系统。结果表明，谷歌云语音到文本和翻译在视频翻译中使用是可靠的。然而，当背景音乐太大声时，它无法识别语音。此外，它还有一个直接翻译的问题，这是一个挑战。因此，未来的研究可能需要一个定制的训练模型。综上所述，本项目提出的系统为解决视频观看平台上的语言障碍提供了一种新的web应用思路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of IT in Asia

自引率

0.00%

发文量