{"title":"HMM-based Thai speech synthesis using unsupervised stress context labeling","authors":"Decha Moungsri, Tomoki Koriyama, Takao Kobayashi","doi":"10.1109/APSIPA.2014.7041599","DOIUrl":null,"url":null,"abstract":"This paper describes an approach to HMM-based Thai speech synthesis using stress context. It has been shown that context related to stressed/unstressed syllable information (stress context) significantly improves the tone correctness of the synthetic speech, but there is a problem of requiring a manual context labeling process in tone modeling. To reduce costs for the stress context labeling, we propose an unsupervised technique for automatic labeling based on the characteristics of Thai stressed syllables, namely, having high FO movement and long duration. In the proposed technique, we use log FO variance and duration of each syllable to classify it into one of stress-related context classes. Objective and subjective evaluation results show that the proposed context labeling gives comparable performance to that conducted carefully by a human in terms of tone naturalness of synthetic speech.","PeriodicalId":231382,"journal":{"name":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2014.7041599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This paper describes an approach to HMM-based Thai speech synthesis using stress context. It has been shown that context related to stressed/unstressed syllable information (stress context) significantly improves the tone correctness of the synthetic speech, but there is a problem of requiring a manual context labeling process in tone modeling. To reduce costs for the stress context labeling, we propose an unsupervised technique for automatic labeling based on the characteristics of Thai stressed syllables, namely, having high FO movement and long duration. In the proposed technique, we use log FO variance and duration of each syllable to classify it into one of stress-related context classes. Objective and subjective evaluation results show that the proposed context labeling gives comparable performance to that conducted carefully by a human in terms of tone naturalness of synthetic speech.