{"title":"基于短语语调的普通话语音声调轮廓建模与生成","authors":"Zhizheng Wu, Yao Qian, F. Soong, Bo Zhang","doi":"10.1109/CHINSL.2008.ECP.42","DOIUrl":null,"url":null,"abstract":"This paper models F0 curves with discrete cosine transform (DCT) representations on both syllable-level tone and phrase-level intonation for Chinese Mandarin speech. Decision trees growing with maximum likelihood (ML) and stopping with minimum description length (MDL) are used to cluster very rich context-dependent DCT models into generalized ones to predict unseen contexts in test robustly. Additionally, we propose to generate Mandarin tone contours by jointly optimizing FO contours of syllable and phrase in ML sense. Experimental results on speaker-dependent continuous and speaker-independent isolated speech corpora show that the proposed approach can be able to generate FO contour with high correlation coefficients of 0.92 and 0.82 respectively, measured between the original and generated F0.","PeriodicalId":291958,"journal":{"name":"2008 6th International Symposium on Chinese Spoken Language Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech\",\"authors\":\"Zhizheng Wu, Yao Qian, F. Soong, Bo Zhang\",\"doi\":\"10.1109/CHINSL.2008.ECP.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper models F0 curves with discrete cosine transform (DCT) representations on both syllable-level tone and phrase-level intonation for Chinese Mandarin speech. Decision trees growing with maximum likelihood (ML) and stopping with minimum description length (MDL) are used to cluster very rich context-dependent DCT models into generalized ones to predict unseen contexts in test robustly. Additionally, we propose to generate Mandarin tone contours by jointly optimizing FO contours of syllable and phrase in ML sense. Experimental results on speaker-dependent continuous and speaker-independent isolated speech corpora show that the proposed approach can be able to generate FO contour with high correlation coefficients of 0.92 and 0.82 respectively, measured between the original and generated F0.\",\"PeriodicalId\":291958,\"journal\":{\"name\":\"2008 6th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 6th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2008.ECP.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 6th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2008.ECP.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modeling and Generating Tone Contour with Phrase Intonation for Mandarin Chinese Speech
This paper models F0 curves with discrete cosine transform (DCT) representations on both syllable-level tone and phrase-level intonation for Chinese Mandarin speech. Decision trees growing with maximum likelihood (ML) and stopping with minimum description length (MDL) are used to cluster very rich context-dependent DCT models into generalized ones to predict unseen contexts in test robustly. Additionally, we propose to generate Mandarin tone contours by jointly optimizing FO contours of syllable and phrase in ML sense. Experimental results on speaker-dependent continuous and speaker-independent isolated speech corpora show that the proposed approach can be able to generate FO contour with high correlation coefficients of 0.92 and 0.82 respectively, measured between the original and generated F0.