{"title":"TTS持续时间模型对新说话者的有效自适应","authors":"Chilin Shih, Wentao Gu, J. V. Santen","doi":"10.21437/ICSLP.1998-5","DOIUrl":null,"url":null,"abstract":"This paper discusses a methodology using a minimal set of sentences to adapt an existing TTS duration model to capture interspeaker variations. The assumption is that the original duration database contains information of both language-specific and speaker-specific duration characteristics. In training a duration model for a new speaker, only the speaker-specific information needs to be modeled, therefore the size of the training data can be reduced drastically. Results from several experiments are compared and discussed.","PeriodicalId":90685,"journal":{"name":"Proceedings : ICSLP. International Conference on Spoken Language Processing","volume":"151 1","pages":"105-110"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Efficient adaptation of TTS duration model to new speakers\",\"authors\":\"Chilin Shih, Wentao Gu, J. V. Santen\",\"doi\":\"10.21437/ICSLP.1998-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper discusses a methodology using a minimal set of sentences to adapt an existing TTS duration model to capture interspeaker variations. The assumption is that the original duration database contains information of both language-specific and speaker-specific duration characteristics. In training a duration model for a new speaker, only the speaker-specific information needs to be modeled, therefore the size of the training data can be reduced drastically. Results from several experiments are compared and discussed.\",\"PeriodicalId\":90685,\"journal\":{\"name\":\"Proceedings : ICSLP. International Conference on Spoken Language Processing\",\"volume\":\"151 1\",\"pages\":\"105-110\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings : ICSLP. International Conference on Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/ICSLP.1998-5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings : ICSLP. International Conference on Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient adaptation of TTS duration model to new speakers
This paper discusses a methodology using a minimal set of sentences to adapt an existing TTS duration model to capture interspeaker variations. The assumption is that the original duration database contains information of both language-specific and speaker-specific duration characteristics. In training a duration model for a new speaker, only the speaker-specific information needs to be modeled, therefore the size of the training data can be reduced drastically. Results from several experiments are compared and discussed.