中文文本-语音合成的叠加韵律模型

2004 International Symposium on Chinese Spoken Language Processing Pub Date : 2004-12-15 DOI:10.1109/CHINSL.2004.1409615

G. Chen, G. Bailly, Qingfeng Liu, Ren-Hua Wang

{"title":"中文文本-语音合成的叠加韵律模型","authors":"G. Chen, G. Bailly, Qingfeng Liu, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409615","DOIUrl":null,"url":null,"abstract":"The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"A superposed prosodic model for Chinese text-to-speech synthesis\",\"authors\":\"G. Chen, G. Bailly, Qingfeng Liu, Ren-Hua Wang\",\"doi\":\"10.1109/CHINSL.2004.1409615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.\",\"PeriodicalId\":212562,\"journal\":{\"name\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2004.1409615\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

本文介绍了可训练的SFC叠加韵律模型在汉语中的应用。在SFC模型中，韵律参数(F0，音节延长)被解释为重叠的多参数轮廓的叠加。这些轮廓线与不同范围的高级韵律特征有关，如音调、重音、韵律边界、词的词性等。每个特征标签对应于一个元语言功能(形态、词汇、句法、态度等)，这些功能由神经网络表示。观察到的轮廓是相应神经网络输出的和。实现了一种自动学习的综合分析方案。这个模型在相邻单元的连接中工作得很好。F0预测的RMSE为2.34 st(参考200 Hz)，相关系数为0.86。感知实验表明，预测的韵律相当恰当和流畅。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A superposed prosodic model for Chinese text-to-speech synthesis

The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2004 International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量