{"title":"Text-To-Speech quality evaluation based on LSTM Recurrent Neural Networks","authors":"Meng Tang, Jie Zhu","doi":"10.1109/ICCNC.2019.8685619","DOIUrl":null,"url":null,"abstract":"Nowadays, the Text-To-Speech (TTS) system has developed to quite a high level, but there has not been an objective assessment method to evaluate the synthesized speech effectively. Research on the objective assessment method is around predicting the mean opinion score(MOS) of the speech in general. In this paper, a mandarin TTS evaluation method using LSTM+LR to predict the MOS is proposed. To the best of our knowledge, this is the first research in evaluating mandarin TTS. Compared with other methods such as the CNN+LR, which is the previous best method, this method achieves much higher accuracy with the root mean square(RMSE) of 0.40 and the correlation $\\mathbf { \\rho } _ { \\mathbf { S } }$ of 0.68.","PeriodicalId":161815,"journal":{"name":"2019 International Conference on Computing, Networking and Communications (ICNC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Computing, Networking and Communications (ICNC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCNC.2019.8685619","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Nowadays, the Text-To-Speech (TTS) system has developed to quite a high level, but there has not been an objective assessment method to evaluate the synthesized speech effectively. Research on the objective assessment method is around predicting the mean opinion score(MOS) of the speech in general. In this paper, a mandarin TTS evaluation method using LSTM+LR to predict the MOS is proposed. To the best of our knowledge, this is the first research in evaluating mandarin TTS. Compared with other methods such as the CNN+LR, which is the previous best method, this method achieves much higher accuracy with the root mean square(RMSE) of 0.40 and the correlation $\mathbf { \rho } _ { \mathbf { S } }$ of 0.68.