基于文本规范化的孟加拉语端到端语音合成

2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII) Pub Date : 2018-07-01 DOI:10.1109/CSII.2018.00019

T. Pial, Shahreen Salim Aunti, Shabbir Ahmed, Hasnain Heickal

{"title":"基于文本规范化的孟加拉语端到端语音合成","authors":"T. Pial, Shahreen Salim Aunti, Shabbir Ahmed, Hasnain Heickal","doi":"10.1109/CSII.2018.00019","DOIUrl":null,"url":null,"abstract":"Text to speech synthesis is a well-researched area, yet no system has been developed which can claim to be as convincing as a human voice. An end-to-end system in the context of speech synthesis denotes a system capable of synthesizing speech from text using training data as minimal as transcribed audio data without any language-specific knowledge and phoneme dictionaries. But an end-to-end system should also have the capability to integrate any language-specific rules to improve its performance. In this paper, we propose an end-to-end speech synthesis system for Bangla (also known as Bengali) which uses a minimal front end and a neural network as its statistical parametric model. We also propose a Text Normalization Procedure(TNP) for Bangla and incorporate it to the end-to-end system. We have conducted extensive experiments using different models. From the feedback from the participants of the experiment, we have found out that they felt more positively towards the system if TNP is incorporated. A Wilcoxon signed-rank test was conducted to validate the results of the experiment and the probability of the results being like this because of experimental errors rather than TNP was calculated to be less than 5%.","PeriodicalId":202365,"journal":{"name":"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"End-to-End Speech Synthesis for Bangla with Text Normalization\",\"authors\":\"T. Pial, Shahreen Salim Aunti, Shabbir Ahmed, Hasnain Heickal\",\"doi\":\"10.1109/CSII.2018.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text to speech synthesis is a well-researched area, yet no system has been developed which can claim to be as convincing as a human voice. An end-to-end system in the context of speech synthesis denotes a system capable of synthesizing speech from text using training data as minimal as transcribed audio data without any language-specific knowledge and phoneme dictionaries. But an end-to-end system should also have the capability to integrate any language-specific rules to improve its performance. In this paper, we propose an end-to-end speech synthesis system for Bangla (also known as Bengali) which uses a minimal front end and a neural network as its statistical parametric model. We also propose a Text Normalization Procedure(TNP) for Bangla and incorporate it to the end-to-end system. We have conducted extensive experiments using different models. From the feedback from the participants of the experiment, we have found out that they felt more positively towards the system if TNP is incorporated. A Wilcoxon signed-rank test was conducted to validate the results of the experiment and the probability of the results being like this because of experimental errors rather than TNP was calculated to be less than 5%.\",\"PeriodicalId\":202365,\"journal\":{\"name\":\"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSII.2018.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSII.2018.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

文本到语音的合成是一个研究得很好的领域，但目前还没有开发出能够像人类声音一样令人信服的系统。语音合成上下文中的端到端系统是指能够使用最小的训练数据(如转录音频数据)从文本合成语音的系统，而无需任何语言特定知识和音素字典。但是端到端系统还应该有能力集成任何特定于语言的规则，以提高其性能。在本文中，我们提出了一个端到端的孟加拉语语音合成系统，该系统使用最小的前端和神经网络作为其统计参数模型。我们还为孟加拉语提出了一个文本规范化过程(TNP)，并将其纳入端到端系统。我们用不同的模型进行了大量的实验。从实验参与者的反馈中，我们发现，如果纳入TNP，他们对系统的感觉会更积极。对实验结果进行了Wilcoxon符号秩检验，计算出由于实验误差而非TNP导致结果出现这种情况的概率小于5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

End-to-End Speech Synthesis for Bangla with Text Normalization

Text to speech synthesis is a well-researched area, yet no system has been developed which can claim to be as convincing as a human voice. An end-to-end system in the context of speech synthesis denotes a system capable of synthesizing speech from text using training data as minimal as transcribed audio data without any language-specific knowledge and phoneme dictionaries. But an end-to-end system should also have the capability to integrate any language-specific rules to improve its performance. In this paper, we propose an end-to-end speech synthesis system for Bangla (also known as Bengali) which uses a minimal front end and a neural network as its statistical parametric model. We also propose a Text Normalization Procedure(TNP) for Bangla and incorporate it to the end-to-end system. We have conducted extensive experiments using different models. From the feedback from the participants of the experiment, we have found out that they felt more positively towards the system if TNP is incorporated. A Wilcoxon signed-rank test was conducted to validate the results of the experiment and the probability of the results being like this because of experimental errors rather than TNP was calculated to be less than 5%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 5th International Conference on Computational Science/ Intelligence and Applied Informatics (CSII)

自引率

0.00%

发文量