基于歌词和形状特征的F0建模和生成方法的研究

2012 8th International Symposium on Chinese Spoken Language Processing Pub Date : 2012-12-01 DOI:10.1109/ISCSLP.2012.6423491

Siu Wa Lee, M. Dong, Haizhou Li

{"title":"基于歌词和形状特征的F0建模和生成方法的研究","authors":"Siu Wa Lee, M. Dong, Haizhou Li","doi":"10.1109/ISCSLP.2012.6423491","DOIUrl":null,"url":null,"abstract":"Natural pitch fluctuation is essential to singing voice. Recently, we have proposed a generalized F0 modelling method which models the expected F0 fluctuation under various contexts with note HMMs. Knowing that having F0 contours close to human professional singing promotes perceived quality, we are confronted with two requirements: (1) accurate estimation on F0 and (2) precise voiced/unvoiced decisions. In this paper, we introduce two techniques in the above directions. Influence of lyrics phonetics on singing F0 is considered to capture the F0 and voicing behaviour brought from different note-lyrics combinations. The generalized F0 modelling method is further extended to frequency-domain to study if shape characterization in terms of sinusoids helps F0 estimation or not. Our experiments showed that the use of lyrics information leads to better F0 generation and improves naturalness of synthesized singing. While the frequency-domain representation is viable, its performance is less competitive than time-domain representation, which requires further study.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis\",\"authors\":\"Siu Wa Lee, M. Dong, Haizhou Li\",\"doi\":\"10.1109/ISCSLP.2012.6423491\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Natural pitch fluctuation is essential to singing voice. Recently, we have proposed a generalized F0 modelling method which models the expected F0 fluctuation under various contexts with note HMMs. Knowing that having F0 contours close to human professional singing promotes perceived quality, we are confronted with two requirements: (1) accurate estimation on F0 and (2) precise voiced/unvoiced decisions. In this paper, we introduce two techniques in the above directions. Influence of lyrics phonetics on singing F0 is considered to capture the F0 and voicing behaviour brought from different note-lyrics combinations. The generalized F0 modelling method is further extended to frequency-domain to study if shape characterization in terms of sinusoids helps F0 estimation or not. Our experiments showed that the use of lyrics information leads to better F0 generation and improves naturalness of synthesized singing. While the frequency-domain representation is viable, its performance is less competitive than time-domain representation, which requires further study.\",\"PeriodicalId\":186099,\"journal\":{\"name\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 8th International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP.2012.6423491\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423491","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

自然的音高波动对歌唱的声音至关重要。最近，我们提出了一种广义的F0建模方法，该方法用注hmm对各种情况下的期望F0波动进行建模。知道让F0轮廓接近人类专业歌唱可以提高感知质量，我们面临两个要求:(1)对F0的准确估计和(2)精确的浊音/非浊音决策。在本文中，我们将介绍上述两个方向的两种技术。考虑歌词语音学对演唱F0的影响，捕捉不同音符-歌词组合带来的F0和发声行为。将广义F0建模方法进一步扩展到频域，研究正弦曲线的形状表征是否有助于F0的估计。我们的实验表明，歌词信息的使用可以更好地生成F0，提高合成歌唱的自然度。虽然频域表示是可行的，但其性能不如时域表示具有竞争力，这需要进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A study of F0 modelling and generation with lyrics and shape characterization for singing voice synthesis

Natural pitch fluctuation is essential to singing voice. Recently, we have proposed a generalized F0 modelling method which models the expected F0 fluctuation under various contexts with note HMMs. Knowing that having F0 contours close to human professional singing promotes perceived quality, we are confronted with two requirements: (1) accurate estimation on F0 and (2) precise voiced/unvoiced decisions. In this paper, we introduce two techniques in the above directions. Influence of lyrics phonetics on singing F0 is considered to capture the F0 and voicing behaviour brought from different note-lyrics combinations. The generalized F0 modelling method is further extended to frequency-domain to study if shape characterization in terms of sinusoids helps F0 estimation or not. Our experiments showed that the use of lyrics information leads to better F0 generation and improves naturalness of synthesized singing. While the frequency-domain representation is viable, its performance is less competitive than time-domain representation, which requires further study.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 8th International Symposium on Chinese Spoken Language Processing

自引率

0.00%

发文量