利用韵律信息改进基于hmm的越南语语音合成的自然度

The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF) Pub Date : 2013-11-01 DOI:10.1109/RIVF.2013.6719907

Thanh-Son Phan, T. Duong, Anh-Tuan Dinh, T. Vu, C. Luong

{"title":"利用韵律信息改进基于hmm的越南语语音合成的自然度","authors":"Thanh-Son Phan, T. Duong, Anh-Tuan Dinh, T. Vu, C. Luong","doi":"10.1109/RIVF.2013.6719907","DOIUrl":null,"url":null,"abstract":"Natural-sounding synthesized speech is goal of HMM-based Text-to-Speech systems. Besides using context dependent tri-phone units from a large corpus speech database, many prosody features have been used in full-context labels to improve naturalness of HMM-based Vietnamese synthesizer. In the prosodic specification, tone, part-of-speech (POS) and intonation information are considered not as important as positional information. Context-dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, tone, intonation pattern, and segmental duration. In this paper, we propose decision tree questions that use context-dependent tones and investigate the impact of POS and intonation tagging on the naturalness of HMM-based voice. Experimental results show that our proposed method can improve naturalness of a HMM-based Vietnamese TTS through objective evaluation and MOS test.","PeriodicalId":121216,"journal":{"name":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information\",\"authors\":\"Thanh-Son Phan, T. Duong, Anh-Tuan Dinh, T. Vu, C. Luong\",\"doi\":\"10.1109/RIVF.2013.6719907\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural-sounding synthesized speech is goal of HMM-based Text-to-Speech systems. Besides using context dependent tri-phone units from a large corpus speech database, many prosody features have been used in full-context labels to improve naturalness of HMM-based Vietnamese synthesizer. In the prosodic specification, tone, part-of-speech (POS) and intonation information are considered not as important as positional information. Context-dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, tone, intonation pattern, and segmental duration. In this paper, we propose decision tree questions that use context-dependent tones and investigate the impact of POS and intonation tagging on the naturalness of HMM-based voice. Experimental results show that our proposed method can improve naturalness of a HMM-based Vietnamese TTS through objective evaluation and MOS test.\",\"PeriodicalId\":121216,\"journal\":{\"name\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"volume\":\"131 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RIVF.2013.6719907\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RIVF.2013.6719907","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

自然的合成语音是基于hmm的文本到语音系统的目标。除了从大型语料库语音数据库中使用上下文相关的三电话单元外，还在全上下文标签中使用了许多韵律特征，以提高基于hmm的越南语合成器的自然度。在韵律规范中，声调、词性和语调信息被认为不如位置信息重要。上下文相关信息包括音素序列和韵律信息，因为合成语音的自然程度高度依赖于韵律，如停顿、音调、语调模式和片段持续时间。在本文中，我们提出了使用上下文相关音调的决策树问题，并研究了词性标注和语调标注对基于hmm的语音自然度的影响。实验结果表明，通过客观评价和MOS测试，我们提出的方法可以提高基于hmm的越南语TTS的自然度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improvement of naturalness for an HMM-based Vietnamese speech synthesis using the prosodic information

Natural-sounding synthesized speech is goal of HMM-based Text-to-Speech systems. Besides using context dependent tri-phone units from a large corpus speech database, many prosody features have been used in full-context labels to improve naturalness of HMM-based Vietnamese synthesizer. In the prosodic specification, tone, part-of-speech (POS) and intonation information are considered not as important as positional information. Context-dependent information includes phoneme sequence as well as prosodic information because the naturalness of synthetic speech highly depends on the prosody such as pause, tone, intonation pattern, and segmental duration. In this paper, we propose decision tree questions that use context-dependent tones and investigate the impact of POS and intonation tagging on the naturalness of HMM-based voice. Experimental results show that our proposed method can improve naturalness of a HMM-based Vietnamese TTS through objective evaluation and MOS test.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2013 RIVF International Conference on Computing & Communication Technologies - Research, Innovation, and Vision for Future (RIVF)

自引率

0.00%

发文量