{"title":"基于语速的汉语方言TTS分层韵律模型的自适应研究","authors":"Chen-Yu Chiang","doi":"10.1109/ICSDA.2015.7357862","DOIUrl":null,"url":null,"abstract":"This paper presents a new approach to developing a speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) to be utilized in a SR-controlled TTS for Taiwanese (Min-Nan) language, a resource-limited Chinese dialect. The main issue is to conquer the difficulty of building the SR-HPM directly from a Taiwanese database with sparse coverage of linguistic context, prosody and SR. By using the property that Taiwanese and Mandarin Chinese share the same linguistic characteristics, we propose an adaptation approach to constructing Taiwanese SR-HPM from a small Taiwanese corpus of fast SR with the help of an existing Mandarin SRHPM which is well-trained from a large Mandarin corpus with utterances covering a wide range of SR. The proposed method includes two parts: adaptation of normalization functions (NFs) and adaptive prosody labeling and modeling algorithm (PLM). Both of these two parts are formulated based on MAP estimations with the existing Mandarin SR-HPM serving as an informative prior. Effectiveness of the proposed approach was evaluated by an experiment of prosody generation for Taiwanese TTS using a small corpus of fast speech with SR in 4.5-6.8 syllables/sec. Experimental results showed that the generated prosody sounded quite natural for SR in a wide range of 3.4-6.8 syllables/sec.","PeriodicalId":290790,"journal":{"name":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A study on adaptation of speaking rate-dependent hierarchical prosodic model for Chinese dialect TTS\",\"authors\":\"Chen-Yu Chiang\",\"doi\":\"10.1109/ICSDA.2015.7357862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a new approach to developing a speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) to be utilized in a SR-controlled TTS for Taiwanese (Min-Nan) language, a resource-limited Chinese dialect. The main issue is to conquer the difficulty of building the SR-HPM directly from a Taiwanese database with sparse coverage of linguistic context, prosody and SR. By using the property that Taiwanese and Mandarin Chinese share the same linguistic characteristics, we propose an adaptation approach to constructing Taiwanese SR-HPM from a small Taiwanese corpus of fast SR with the help of an existing Mandarin SRHPM which is well-trained from a large Mandarin corpus with utterances covering a wide range of SR. The proposed method includes two parts: adaptation of normalization functions (NFs) and adaptive prosody labeling and modeling algorithm (PLM). Both of these two parts are formulated based on MAP estimations with the existing Mandarin SR-HPM serving as an informative prior. Effectiveness of the proposed approach was evaluated by an experiment of prosody generation for Taiwanese TTS using a small corpus of fast speech with SR in 4.5-6.8 syllables/sec. Experimental results showed that the generated prosody sounded quite natural for SR in a wide range of 3.4-6.8 syllables/sec.\",\"PeriodicalId\":290790,\"journal\":{\"name\":\"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSDA.2015.7357862\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2015.7357862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A study on adaptation of speaking rate-dependent hierarchical prosodic model for Chinese dialect TTS
This paper presents a new approach to developing a speaking rate (SR)-dependent hierarchical prosodic model (SR-HPM) to be utilized in a SR-controlled TTS for Taiwanese (Min-Nan) language, a resource-limited Chinese dialect. The main issue is to conquer the difficulty of building the SR-HPM directly from a Taiwanese database with sparse coverage of linguistic context, prosody and SR. By using the property that Taiwanese and Mandarin Chinese share the same linguistic characteristics, we propose an adaptation approach to constructing Taiwanese SR-HPM from a small Taiwanese corpus of fast SR with the help of an existing Mandarin SRHPM which is well-trained from a large Mandarin corpus with utterances covering a wide range of SR. The proposed method includes two parts: adaptation of normalization functions (NFs) and adaptive prosody labeling and modeling algorithm (PLM). Both of these two parts are formulated based on MAP estimations with the existing Mandarin SR-HPM serving as an informative prior. Effectiveness of the proposed approach was evaluated by an experiment of prosody generation for Taiwanese TTS using a small corpus of fast speech with SR in 4.5-6.8 syllables/sec. Experimental results showed that the generated prosody sounded quite natural for SR in a wide range of 3.4-6.8 syllables/sec.