基于深度神经网络的菲律宾口语语音转换

2020 IEEE 8th Conference on Systems, Process and Control (ICSPC) Pub Date : 2020-12-11 DOI:10.1109/ICSPC50992.2020.9305801

Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon

{"title":"基于深度神经网络的菲律宾口语语音转换","authors":"Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon","doi":"10.1109/ICSPC50992.2020.9305801","DOIUrl":null,"url":null,"abstract":"Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.","PeriodicalId":273439,"journal":{"name":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Voice Conversion of Philippine Spoken Languages using Deep Neural Networks\",\"authors\":\"Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon\",\"doi\":\"10.1109/ICSPC50992.2020.9305801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.\",\"PeriodicalId\":273439,\"journal\":{\"name\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPC50992.2020.9305801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPC50992.2020.9305801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

现有的语音转换系统大多只关注语音的频谱参数，如频谱包络。本课题开发了一个语音转换系统，不仅对语音的频谱参数进行转换，而且对语音的韵律特征进行转换，特别是对F0轮廓进行小波建模，以提高语音的质量和自然度。该系统可以与文本转语音系统结合使用，为语音输出引入个性化和定制化。该项目不仅在英语中实施，而且在菲律宾口语的背景下实施，如他加禄语、希利盖农语和宿务阿诺语。结果表明，英语语音转换在自然度方面得分最高，为2.7167分;而汉语语音转换在可理解度方面得分最高，为3.0875分。使用客观指标，结果表明希利盖农语的梅尔-倒谱失真最低，为5.5335，而英语的F0:RMSE最低，为20.254。结果还表明，性别内语音转换的表现优于性别间语音转换。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Voice Conversion of Philippine Spoken Languages using Deep Neural Networks

Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)

自引率

0.00%

发文量