中文文本-语音合成的叠加韵律模型

G. Chen, G. Bailly, Qingfeng Liu, Ren-Hua Wang
{"title":"中文文本-语音合成的叠加韵律模型","authors":"G. Chen, G. Bailly, Qingfeng Liu, Ren-Hua Wang","doi":"10.1109/CHINSL.2004.1409615","DOIUrl":null,"url":null,"abstract":"The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.","PeriodicalId":212562,"journal":{"name":"2004 International Symposium on Chinese Spoken Language Processing","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"A superposed prosodic model for Chinese text-to-speech synthesis\",\"authors\":\"G. Chen, G. Bailly, Qingfeng Liu, Ren-Hua Wang\",\"doi\":\"10.1109/CHINSL.2004.1409615\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.\",\"PeriodicalId\":212562,\"journal\":{\"name\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2004.1409615\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2004.1409615","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

摘要

本文介绍了可训练的SFC叠加韵律模型在汉语中的应用。在SFC模型中,韵律参数(F0,音节延长)被解释为重叠的多参数轮廓的叠加。这些轮廓线与不同范围的高级韵律特征有关,如音调、重音、韵律边界、词的词性等。每个特征标签对应于一个元语言功能(形态、词汇、句法、态度等),这些功能由神经网络表示。观察到的轮廓是相应神经网络输出的和。实现了一种自动学习的综合分析方案。这个模型在相邻单元的连接中工作得很好。F0预测的RMSE为2.34 st(参考200 Hz),相关系数为0.86。感知实验表明,预测的韵律相当恰当和流畅。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A superposed prosodic model for Chinese text-to-speech synthesis
The paper presents the application of the trainable SFC superpositional prosodic model to Chinese. Within the SFC model, prosodic parameters (F0, syllabic lengthening) are interpreted as the superposition of overlapping multiparametric contours. These contours are associated with high-level prosodic features operating at different scopes, such as tones, stress, prosodic boundary, part of speech of words, etc. Each feature label corresponds to a metalinguistic function (morphological, lexical, syntactic, attitudinal, etc.) which is represented by a neural network. The observed contour is the sum of the outputs of the corresponding neural networks. An analysis-by-synthesis scheme is implemented for automatic learning. This model works well in the concatenation of neighbored units. The RMSE of F0 prediction is 2.34 st (referenced to 200 Hz), correlation is 0.86. Perceptual experiments show that the predicted prosody is quite appropriate and fluent.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信