基于说话人自适应hmm的越南语文本转语音系统

Duy Khanh Ninh
{"title":"基于说话人自适应hmm的越南语文本转语音系统","authors":"Duy Khanh Ninh","doi":"10.1109/KSE.2019.8919326","DOIUrl":null,"url":null,"abstract":"This paper describes the first attempt in developing a Vietnamese HMM-based Text-to-Speech system using the speaker-adaptive approach. Although speaker-dependent systems have been built widely, no speaker-adaptive system has been developed for Vietnamese so far. We collected speech data from several Vietnamese native speakers and employed state-of-the-art speech analysis, model training and speaker adaptation techniques to develop the system. Besides, we performed perceptual experiments to compare the quality of speaker-adapted (SA) voices built on the average voice model and speaker-dependent (SD) voices built on SD models, and to confirm the effects of contextual features including word boundary (WB) and part-of-speech (POS) on the quality of synthetic speech. Evaluation results show that SA voices have significantly higher naturalness than SD voices when the same limited contextual feature set excluding WB and POS was used. In addition, SA voices trained with limited contextual features excluding WB and POS still have better quality than SD voices trained with full contextual features including WB and POS. These results show the robustness of the speaker-adaptive over the speaker-dependent approach for Vietnamese statistical parametric speech synthesis.","PeriodicalId":439841,"journal":{"name":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System\",\"authors\":\"Duy Khanh Ninh\",\"doi\":\"10.1109/KSE.2019.8919326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the first attempt in developing a Vietnamese HMM-based Text-to-Speech system using the speaker-adaptive approach. Although speaker-dependent systems have been built widely, no speaker-adaptive system has been developed for Vietnamese so far. We collected speech data from several Vietnamese native speakers and employed state-of-the-art speech analysis, model training and speaker adaptation techniques to develop the system. Besides, we performed perceptual experiments to compare the quality of speaker-adapted (SA) voices built on the average voice model and speaker-dependent (SD) voices built on SD models, and to confirm the effects of contextual features including word boundary (WB) and part-of-speech (POS) on the quality of synthetic speech. Evaluation results show that SA voices have significantly higher naturalness than SD voices when the same limited contextual feature set excluding WB and POS was used. In addition, SA voices trained with limited contextual features excluding WB and POS still have better quality than SD voices trained with full contextual features including WB and POS. These results show the robustness of the speaker-adaptive over the speaker-dependent approach for Vietnamese statistical parametric speech synthesis.\",\"PeriodicalId\":439841,\"journal\":{\"name\":\"2019 11th International Conference on Knowledge and Systems Engineering (KSE)\",\"volume\":\"261 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 11th International Conference on Knowledge and Systems Engineering (KSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KSE.2019.8919326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2019.8919326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

本文描述了使用说话人自适应方法开发基于越南语hmm的文本到语音系统的首次尝试。虽然依赖于说话人的系统已经广泛建立,但迄今为止还没有针对越南语的说话人自适应系统。我们收集了几位越南语母语者的语音数据,并采用了最先进的语音分析、模型训练和说话者适应技术来开发系统。此外,我们还进行了感知实验,比较了基于平均语音模型构建的说话人自适应(SA)语音和基于SD模型构建的说话人依赖(SD)语音的质量,并证实了词边界(WB)和词性(POS)等语境特征对合成语音质量的影响。评价结果表明,当使用相同的有限上下文特征集(不包括WB和POS)时,SA语音的自然度明显高于SD语音。此外,使用不包括WB和POS的有限上下文特征训练的SA语音仍然比使用包括WB和POS的完整上下文特征训练的SD语音质量更好。这些结果表明,在越南语统计参数语音合成中,说话人自适应方法比说话人依赖方法具有鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Speaker-Adaptive HMM-based Vietnamese Text-to-Speech System
This paper describes the first attempt in developing a Vietnamese HMM-based Text-to-Speech system using the speaker-adaptive approach. Although speaker-dependent systems have been built widely, no speaker-adaptive system has been developed for Vietnamese so far. We collected speech data from several Vietnamese native speakers and employed state-of-the-art speech analysis, model training and speaker adaptation techniques to develop the system. Besides, we performed perceptual experiments to compare the quality of speaker-adapted (SA) voices built on the average voice model and speaker-dependent (SD) voices built on SD models, and to confirm the effects of contextual features including word boundary (WB) and part-of-speech (POS) on the quality of synthetic speech. Evaluation results show that SA voices have significantly higher naturalness than SD voices when the same limited contextual feature set excluding WB and POS was used. In addition, SA voices trained with limited contextual features excluding WB and POS still have better quality than SD voices trained with full contextual features including WB and POS. These results show the robustness of the speaker-adaptive over the speaker-dependent approach for Vietnamese statistical parametric speech synthesis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信