{"title":"Resonance-based spectral deformation in HMM-based speech synthesis","authors":"Jinfu Ni, Y. Shiga, H. Kawai, H. Kashioka","doi":"10.1109/ISCSLP.2012.6423478","DOIUrl":null,"url":null,"abstract":"Speech quality in statistical parametric speech synthesis relies on a sufficiency of acoustical features involved in training samples. This paper presents a spectral deformation method by using spectral-spatial information to expand the density space of acoustical features when limited training samples are available. It makes observed mel-cepstra diffused in a resonance field and achieves multiple spectral variants subject to a resonance mechanism. A statistical contribution of the mel-cepstral variants takes the place of the original while building HMM-based voices. Preliminary speech synthesis experiments are carried out in Chinese and Japanese. The experimental results indicate that the proposed method is able to improve potential discontinuity and enhance speech formants for noise reduction while achieving at least as good MOS quality as using the original.","PeriodicalId":186099,"journal":{"name":"2012 8th International Symposium on Chinese Spoken Language Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 8th International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP.2012.6423478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Speech quality in statistical parametric speech synthesis relies on a sufficiency of acoustical features involved in training samples. This paper presents a spectral deformation method by using spectral-spatial information to expand the density space of acoustical features when limited training samples are available. It makes observed mel-cepstra diffused in a resonance field and achieves multiple spectral variants subject to a resonance mechanism. A statistical contribution of the mel-cepstral variants takes the place of the original while building HMM-based voices. Preliminary speech synthesis experiments are carried out in Chinese and Japanese. The experimental results indicate that the proposed method is able to improve potential discontinuity and enhance speech formants for noise reduction while achieving at least as good MOS quality as using the original.