{"title":"On the analysis and evaluation of prosody conversion techniques","authors":"Berrak Sisman, Grandee Lee, Haizhou Li, K. Tan","doi":"10.1109/IALP.2017.8300542","DOIUrl":null,"url":null,"abstract":"Voice conversion is a process of modifying the characteristics of source speaker such as spectrum or/and prosody, to sound as if it was spoken by another speaker. In this paper, we study the evaluation of prosody transformation, in particular, the evaluation of Fundamental Frequency (F0) conversion. F0 is an essential prosody feature that should be taken care of in a compressive voice conversion framework. So far, the evaluation of the converted prosody features is performed mainly by looking at Pearson Correlation Coefficient and Root Mean Square Error (RMSE). Unfortunately, these techniques do not explicitly measure the F0 alignment between the source and target signals. We believe that an evaluation measure that takes into account the time alignment of F0 is needed to provide a new perspective. Therefore, in this paper, we study a new technique to assess the accuracy of prosody transformation. In our experiments with different prosody transformation techniques, we report that the proposed evaluation approach achieves consistent results with the baseline evaluation metrics.","PeriodicalId":183586,"journal":{"name":"2017 International Conference on Asian Language Processing (IALP)","volume":"310 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Asian Language Processing (IALP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IALP.2017.8300542","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Voice conversion is a process of modifying the characteristics of source speaker such as spectrum or/and prosody, to sound as if it was spoken by another speaker. In this paper, we study the evaluation of prosody transformation, in particular, the evaluation of Fundamental Frequency (F0) conversion. F0 is an essential prosody feature that should be taken care of in a compressive voice conversion framework. So far, the evaluation of the converted prosody features is performed mainly by looking at Pearson Correlation Coefficient and Root Mean Square Error (RMSE). Unfortunately, these techniques do not explicitly measure the F0 alignment between the source and target signals. We believe that an evaluation measure that takes into account the time alignment of F0 is needed to provide a new perspective. Therefore, in this paper, we study a new technique to assess the accuracy of prosody transformation. In our experiments with different prosody transformation techniques, we report that the proposed evaluation approach achieves consistent results with the baseline evaluation metrics.