{"title":"Evaluation of Synthetic Speech Quality by Statistical Analysis of Voiced and Unvoiced Part Durations","authors":"J. Pribil, A. Přibilová, J. Matoušek","doi":"10.1109/TSP.2018.8441352","DOIUrl":null,"url":null,"abstract":"The paper describes a system for automatic evaluation of differences in time duration, phrasing, and time structuring within an analysed sentence. The proposed system was successfully tested in evaluation of sentences originated from male and female voices and produced by a speech synthesizer using the unit selection method with two different prosody manipulation approaches. A detailed analysis shows great influence of the number of statistical parameters on correctness and precision of evaluated results. Larger size of the processed speech material has positive impact on stability of the evaluation process. The obtained results are in principal correlation with the evaluation based on the standard listening test method.","PeriodicalId":383018,"journal":{"name":"2018 41st International Conference on Telecommunications and Signal Processing (TSP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 41st International Conference on Telecommunications and Signal Processing (TSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSP.2018.8441352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The paper describes a system for automatic evaluation of differences in time duration, phrasing, and time structuring within an analysed sentence. The proposed system was successfully tested in evaluation of sentences originated from male and female voices and produced by a speech synthesizer using the unit selection method with two different prosody manipulation approaches. A detailed analysis shows great influence of the number of statistical parameters on correctness and precision of evaluated results. Larger size of the processed speech material has positive impact on stability of the evaluation process. The obtained results are in principal correlation with the evaluation based on the standard listening test method.