{"title":"Break prediction of prosody for Hakka'S TTS systems based on data mining approaches","authors":"Feng-Long Huang, Neng-Huang Pan, Ming-Shing Yu, Jun-Yi Wu","doi":"10.1109/ICMLC.2011.6016704","DOIUrl":null,"url":null,"abstract":"This paper aims at the prosody generation for Hakka's language based on the data mining approaches, and implement the TTS system on Internet. Our system is composed of the following four components: 1) Text analysis, 2) Mandarin to Hakka word translation, 3) Prosody prediction, and 4) Speech generation module. More than 2427 monosyllabic speech units and 2234 word speech units of Hakka and several silences with various durations have been recorded as basic units for speech synthesis. We focus on adding breaks to speeches, with emphasis on predicting the types of break. There are three kinds of breaks: major break, minor break and no-break between words. We train a break model and predict break based on the data mining approaches — Bayesian network (BN) and CART classifier. The best precision rate for testing achieves 80.17% based on the CART. Fourteen students familiar with Hakka joined to evaluate the prosody quality of synthesized speeches. The results with 10 scale achieves 7.54 score in average. Based on the comprehensive evaluation, it is obvious that our system can synthesize the clear and natural Hakka's speeches.","PeriodicalId":228516,"journal":{"name":"2011 International Conference on Machine Learning and Cybernetics","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 International Conference on Machine Learning and Cybernetics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC.2011.6016704","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper aims at the prosody generation for Hakka's language based on the data mining approaches, and implement the TTS system on Internet. Our system is composed of the following four components: 1) Text analysis, 2) Mandarin to Hakka word translation, 3) Prosody prediction, and 4) Speech generation module. More than 2427 monosyllabic speech units and 2234 word speech units of Hakka and several silences with various durations have been recorded as basic units for speech synthesis. We focus on adding breaks to speeches, with emphasis on predicting the types of break. There are three kinds of breaks: major break, minor break and no-break between words. We train a break model and predict break based on the data mining approaches — Bayesian network (BN) and CART classifier. The best precision rate for testing achieves 80.17% based on the CART. Fourteen students familiar with Hakka joined to evaluate the prosody quality of synthesized speeches. The results with 10 scale achieves 7.54 score in average. Based on the comprehensive evaluation, it is obvious that our system can synthesize the clear and natural Hakka's speeches.