{"title":"An acoustic and articulatory knowledge integrated method for improving synthetic Mandarin speech's fluency","authors":"H. Gu, Kuo-Hsian Wang","doi":"10.1109/CHINSL.2004.1409622","DOIUrl":null,"url":null,"abstract":"In synthetic Mandarin speech, discontinuity of formant traces at syllable boundaries is a key factor that lowers the fluency level. Therefore, we study an acoustic and articulatory knowledge integrated method to solve this discontinuity problem. First, representative trisyllable contexts are selected and their signals are recorded. The signal of the middle syllable of each trisyllable pronunciation is then extracted to make a synthesis unit. To select a synthesis unit among multiple candidates, a distance function is defined to measure the spectral similarity between two synthesis units to be concatenated. In addition, several linking-restriction rules are derived, according to articulatory knowledge, to prevent some synthesis units being linked into a sequence. Then, a globally best synthesis-unit sequence is searched by using a dynamic programming based algorithm. When this method is applied, the formant traces at syllable boundaries become smoother. Also, subjective evaluation shows that the fluency level of synthetic Mandarin speech can indeed be improved a lot.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In synthetic Mandarin speech, discontinuity of formant traces at syllable boundaries is a key factor that lowers the fluency level. Therefore, we study an acoustic and articulatory knowledge integrated method to solve this discontinuity problem. First, representative trisyllable contexts are selected and their signals are recorded. The signal of the middle syllable of each trisyllable pronunciation is then extracted to make a synthesis unit. To select a synthesis unit among multiple candidates, a distance function is defined to measure the spectral similarity between two synthesis units to be concatenated. In addition, several linking-restriction rules are derived, according to articulatory knowledge, to prevent some synthesis units being linked into a sequence. Then, a globally best synthesis-unit sequence is searched by using a dynamic programming based algorithm. When this method is applied, the formant traces at syllable boundaries become smoother. Also, subjective evaluation shows that the fluency level of synthetic Mandarin speech can indeed be improved a lot.