Kumiko Morizane, Keigo Nakamura, T. Toda, H. Saruwatari, K. Shikano
{"title":"Emphasized speech synthesis based on hidden Markov models","authors":"Kumiko Morizane, Keigo Nakamura, T. Toda, H. Saruwatari, K. Shikano","doi":"10.1109/ICSDA.2009.5278371","DOIUrl":null,"url":null,"abstract":"This paper presents a statistical approach to synthesizing emphasized speech based on hidden Markov models (HMMs). Context-dependent HMMs are trained using emphasized speech data uttered by intentionally emphasizing an arbitrary accentual phrase in a sentence. To model acoustic characteristics of emphasized speech, new contextual factors describing an emphasized accentual phrase are additionally considered in model training. Moreover, to build HMMs for synthesizing both normal speech and emphasized speech, we investigate two training methods; one is training of individual models for normal and emphasized speech using each of these two types of speech data separately; and the other is training of a mixed model using both of them simultaneously. The experimental results demonstrate that 1) HMM-based speech synthesis is effective for synthesizing emphasized speech and 2) the mixed model allows a more compact HMM set generating more naturally sounding but slightly less emphasized speech compared with the individual models.","PeriodicalId":254906,"journal":{"name":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","volume":"92 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Oriental COCOSDA International Conference on Speech Database and Assessments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2009.5278371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
This paper presents a statistical approach to synthesizing emphasized speech based on hidden Markov models (HMMs). Context-dependent HMMs are trained using emphasized speech data uttered by intentionally emphasizing an arbitrary accentual phrase in a sentence. To model acoustic characteristics of emphasized speech, new contextual factors describing an emphasized accentual phrase are additionally considered in model training. Moreover, to build HMMs for synthesizing both normal speech and emphasized speech, we investigate two training methods; one is training of individual models for normal and emphasized speech using each of these two types of speech data separately; and the other is training of a mixed model using both of them simultaneously. The experimental results demonstrate that 1) HMM-based speech synthesis is effective for synthesizing emphasized speech and 2) the mixed model allows a more compact HMM set generating more naturally sounding but slightly less emphasized speech compared with the individual models.