{"title":"基于hmm的混合语言(中英)语音合成","authors":"Yao Qian, Houwei Cao, F. Soong","doi":"10.1109/CHINSL.2008.ECP.15","DOIUrl":null,"url":null,"abstract":"English words or short phrases embedded in Mandarin utterances have become more common among bilingually educated people like college students in China. Similarly, it becomes highly desirable that TTS systems can synthesize mixed- language speech properly. Recently, we proposed an HMM-based bilingual TTS to synthesize a target language when only monolingual source language recording from a speaker is available. In this paper, we extend it to synthesize mixed- language sentences. A cross-language state mapping is first established between decision trees built from the English and Mandarin recordings of a bilingual speaker. Via the mapping, English words or phrases embedded in Mandarin sentences can then be synthesized. The bilingual state-mapping is extended to monolingual speaker to perform mixed-language synthesis. Perceptual test results show: (1) decent intelligibility, confirmed by an English word transcription accuracy of 86%; (2) good speech quality with an average MOS score of 3.2.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":"{\"title\":\"HMM-Based Mixed-Language (Mandarin-English) Speech Synthesis\",\"authors\":\"Yao Qian, Houwei Cao, F. Soong\",\"doi\":\"10.1109/CHINSL.2008.ECP.15\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"English words or short phrases embedded in Mandarin utterances have become more common among bilingually educated people like college students in China. Similarly, it becomes highly desirable that TTS systems can synthesize mixed- language speech properly. Recently, we proposed an HMM-based bilingual TTS to synthesize a target language when only monolingual source language recording from a speaker is available. In this paper, we extend it to synthesize mixed- language sentences. A cross-language state mapping is first established between decision trees built from the English and Mandarin recordings of a bilingual speaker. Via the mapping, English words or phrases embedded in Mandarin sentences can then be synthesized. The bilingual state-mapping is extended to monolingual speaker to perform mixed-language synthesis. Perceptual test results show: (1) decent intelligibility, confirmed by an English word transcription accuracy of 86%; (2) good speech quality with an average MOS score of 3.2.\",\"PeriodicalId\":291958,\"journal\":{\"name\":\"2008 6th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"150 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"18\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 6th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2008.ECP.15\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 6th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2008.ECP.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
English words or short phrases embedded in Mandarin utterances have become more common among bilingually educated people like college students in China. Similarly, it becomes highly desirable that TTS systems can synthesize mixed- language speech properly. Recently, we proposed an HMM-based bilingual TTS to synthesize a target language when only monolingual source language recording from a speaker is available. In this paper, we extend it to synthesize mixed- language sentences. A cross-language state mapping is first established between decision trees built from the English and Mandarin recordings of a bilingual speaker. Via the mapping, English words or phrases embedded in Mandarin sentences can then be synthesized. The bilingual state-mapping is extended to monolingual speaker to perform mixed-language synthesis. Perceptual test results show: (1) decent intelligibility, confirmed by an English word transcription accuracy of 86%; (2) good speech quality with an average MOS score of 3.2.