基于深度神经网络的菲律宾口语语音转换

Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon
{"title":"基于深度神经网络的菲律宾口语语音转换","authors":"Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon","doi":"10.1109/ICSPC50992.2020.9305801","DOIUrl":null,"url":null,"abstract":"Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.","PeriodicalId":273439,"journal":{"name":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Voice Conversion of Philippine Spoken Languages using Deep Neural Networks\",\"authors\":\"Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon\",\"doi\":\"10.1109/ICSPC50992.2020.9305801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.\",\"PeriodicalId\":273439,\"journal\":{\"name\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPC50992.2020.9305801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPC50992.2020.9305801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

现有的语音转换系统大多只关注语音的频谱参数,如频谱包络。本课题开发了一个语音转换系统,不仅对语音的频谱参数进行转换,而且对语音的韵律特征进行转换,特别是对F0轮廓进行小波建模,以提高语音的质量和自然度。该系统可以与文本转语音系统结合使用,为语音输出引入个性化和定制化。该项目不仅在英语中实施,而且在菲律宾口语的背景下实施,如他加禄语、希利盖农语和宿务阿诺语。结果表明,英语语音转换在自然度方面得分最高,为2.7167分;而汉语语音转换在可理解度方面得分最高,为3.0875分。使用客观指标,结果表明希利盖农语的梅尔-倒谱失真最低,为5.5335,而英语的F0:RMSE最低,为20.254。结果还表明,性别内语音转换的表现优于性别间语音转换。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Voice Conversion of Philippine Spoken Languages using Deep Neural Networks
Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信