Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon
{"title":"基于深度神经网络的菲律宾口语语音转换","authors":"Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon","doi":"10.1109/ICSPC50992.2020.9305801","DOIUrl":null,"url":null,"abstract":"Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.","PeriodicalId":273439,"journal":{"name":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Voice Conversion of Philippine Spoken Languages using Deep Neural Networks\",\"authors\":\"Michael Gian V. Gonzales, C. R. Lucas, M. G. A. Bayona, F. D. de Leon\",\"doi\":\"10.1109/ICSPC50992.2020.9305801\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.\",\"PeriodicalId\":273439,\"journal\":{\"name\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSPC50992.2020.9305801\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 8th Conference on Systems, Process and Control (ICSPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSPC50992.2020.9305801","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Voice Conversion of Philippine Spoken Languages using Deep Neural Networks
Most of the voice conversion systems available have only focused on the spectral parameter of the speech such as the spectral envelope. This project developed a voice conversion system that converts not only the spectral parameters but also the prosodic features of speech, specifically the Wavelet modeling of the F0 contour, to improve the voice quality and naturalness. This system can be used in conjunction with text-to-speech systems to introduce personalization and customization to the speech output. The project was implemented not only in the English language but also in the context of Philippine Spoken Languages such as Tagalog, Hiligaynon and Cebuano. Results show that the English voice conversion yielded the highest score in terms of naturalness with 2.7167 Mean-Opinion Score and Cebuano in terms of intelligibility with 3.0875 Mean-Opinion Score. Using the objective metrics, results show that Hiligaynon has the lowest Mel-Cepstral Distortion with 5.5335 while English yielded the lowest F0:RMSE with 20.254. Results also showed that intra-gender voice conversion performs better than inter-gender.