{"title":"Myanmar Text-to-Speech System based on Tacotron (End-to-End Generative Model)","authors":"Yuzana Win, Htoo Pyae Lwin, Tomonari Masada","doi":"10.1109/ICTC49870.2020.9289277","DOIUrl":null,"url":null,"abstract":"The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system using an end-to-end generative model called Tacotron. We introduce the open-source implementation for Myanmar text-to-speech system with very high natural-sounding. In this paper, there are four main parts: speech corpus creation, data pre-processing, applying end-to-end generative model, and speech synthesis. Firstly, we develop a speech corpus of 8k sentences from a large set of news articles, novel books, daily usages and travel-related expressions for corpus creation. Secondly, we use a syllable segmenter and text normalizer for data pre-processing. Thirdly, we apply end-to-end generative model called Tacotron that synthesizes speech directly from the sequence of text characters. Finally, we use Griffin-Lim algorithm to convert the corresponding text into the output speech. For the subjective evaluation, we compare our synthesized speech output with the original recording speech in both intelligibility and naturalness by using mean opinion score (MOS). The experimental results show that we can obtain the synthesized speech comparable to the similar state-of-the-art synthsizers for other languages.","PeriodicalId":282243,"journal":{"name":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Information and Communication Technology Convergence (ICTC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTC49870.2020.9289277","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The main motivation of this paper is to improve the naturalness of Myanmar text-to-speech system using an end-to-end generative model called Tacotron. We introduce the open-source implementation for Myanmar text-to-speech system with very high natural-sounding. In this paper, there are four main parts: speech corpus creation, data pre-processing, applying end-to-end generative model, and speech synthesis. Firstly, we develop a speech corpus of 8k sentences from a large set of news articles, novel books, daily usages and travel-related expressions for corpus creation. Secondly, we use a syllable segmenter and text normalizer for data pre-processing. Thirdly, we apply end-to-end generative model called Tacotron that synthesizes speech directly from the sequence of text characters. Finally, we use Griffin-Lim algorithm to convert the corresponding text into the output speech. For the subjective evaluation, we compare our synthesized speech output with the original recording speech in both intelligibility and naturalness by using mean opinion score (MOS). The experimental results show that we can obtain the synthesized speech comparable to the similar state-of-the-art synthsizers for other languages.