{"title":"基于大语料库的藏语拉萨语音拼接合成系统的实现","authors":"Zhenye Gan, Zhenwen Wang, Hongwu Yang","doi":"10.1109/ICOT.2014.6956607","DOIUrl":null,"url":null,"abstract":"This paper presents a method to realize the Tibetan Lhasa speech concatenation synthesis based on a large corpus. A large corpus of Tibetan Lhasa dialect is established by analyzing the characteristics of Tibetan Lhasa dialect. A grapheme-to-phoneme conversion method is realized to convert Tibetan sentences to Speech Assessment Methods Phonetic Alphabet (SAMPA)-based Pinyin sequences. Firstly, Tibetan text is converted to Pinyin sequences based on SAMPA-T transformation method. Then the Tibetan acoustic finals and syllables are used as units to builds Classification and Regression Tree (CART) according to the spectral distance of each candidate units and the context dependent question sets. The CART algorithm is applied to choose the acoustic finals and syllables which are most conform to the context information. Finally, the Tibetan Lhasa speech is then synthesized by waveform concatenation synthesis method. Tests show that the MOS of Synthetic Tibetan Lhasa speech by using acoustic finals or syllables as units is 3.9 points and 4.1 points respectively. The quality of synthesized Tibetan Lhasa speech by using syllables as units is better than acoustic finals.","PeriodicalId":343641,"journal":{"name":"2014 International Conference on Orange Technologies","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Realizing Tibetan Lhasa speech concatenation synthesis system based on a large corpus\",\"authors\":\"Zhenye Gan, Zhenwen Wang, Hongwu Yang\",\"doi\":\"10.1109/ICOT.2014.6956607\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a method to realize the Tibetan Lhasa speech concatenation synthesis based on a large corpus. A large corpus of Tibetan Lhasa dialect is established by analyzing the characteristics of Tibetan Lhasa dialect. A grapheme-to-phoneme conversion method is realized to convert Tibetan sentences to Speech Assessment Methods Phonetic Alphabet (SAMPA)-based Pinyin sequences. Firstly, Tibetan text is converted to Pinyin sequences based on SAMPA-T transformation method. Then the Tibetan acoustic finals and syllables are used as units to builds Classification and Regression Tree (CART) according to the spectral distance of each candidate units and the context dependent question sets. The CART algorithm is applied to choose the acoustic finals and syllables which are most conform to the context information. Finally, the Tibetan Lhasa speech is then synthesized by waveform concatenation synthesis method. Tests show that the MOS of Synthetic Tibetan Lhasa speech by using acoustic finals or syllables as units is 3.9 points and 4.1 points respectively. The quality of synthesized Tibetan Lhasa speech by using syllables as units is better than acoustic finals.\",\"PeriodicalId\":343641,\"journal\":{\"name\":\"2014 International Conference on Orange Technologies\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Orange Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOT.2014.6956607\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Orange Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOT.2014.6956607","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Realizing Tibetan Lhasa speech concatenation synthesis system based on a large corpus
This paper presents a method to realize the Tibetan Lhasa speech concatenation synthesis based on a large corpus. A large corpus of Tibetan Lhasa dialect is established by analyzing the characteristics of Tibetan Lhasa dialect. A grapheme-to-phoneme conversion method is realized to convert Tibetan sentences to Speech Assessment Methods Phonetic Alphabet (SAMPA)-based Pinyin sequences. Firstly, Tibetan text is converted to Pinyin sequences based on SAMPA-T transformation method. Then the Tibetan acoustic finals and syllables are used as units to builds Classification and Regression Tree (CART) according to the spectral distance of each candidate units and the context dependent question sets. The CART algorithm is applied to choose the acoustic finals and syllables which are most conform to the context information. Finally, the Tibetan Lhasa speech is then synthesized by waveform concatenation synthesis method. Tests show that the MOS of Synthetic Tibetan Lhasa speech by using acoustic finals or syllables as units is 3.9 points and 4.1 points respectively. The quality of synthesized Tibetan Lhasa speech by using syllables as units is better than acoustic finals.