{"title":"南亚语言多语言文本-语音的统一音系表征","authors":"Isin Demirsahin, Martin Jansche, Alexander Gutkin","doi":"10.21437/SLTU.2018-17","DOIUrl":null,"url":null,"abstract":"We present a multilingual phoneme inventory and inclusion mappings from the native inventories of several major South Asian languages for multilingual parametric text-to-speech synthesis (TTS). Our goal is to reduce the need for training data when building new TTS voices by leveraging available data for similar languages within a common feature design. For West Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Tel-ugu, and Urdu we compare TTS voices trained only on monolingual data with voices trained on multilingual data from 12 languages. In subjective evaluations multilingually trained voices outperform (or in a few cases are statistically tied with) the corresponding monolingual voices. The multilingual setup can further be used to synthesize speech for languages not seen in the training data; preliminary evaluations lean towards good. Our results indicate that pooling data from different languages in a single acoustic model can be beneficial, opening up new uses and research questions.","PeriodicalId":190269,"journal":{"name":"Workshop on Spoken Language Technologies for Under-resourced Languages","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"A Unified Phonological Representation of South Asian Languages for Multilingual Text-to-Speech\",\"authors\":\"Isin Demirsahin, Martin Jansche, Alexander Gutkin\",\"doi\":\"10.21437/SLTU.2018-17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present a multilingual phoneme inventory and inclusion mappings from the native inventories of several major South Asian languages for multilingual parametric text-to-speech synthesis (TTS). Our goal is to reduce the need for training data when building new TTS voices by leveraging available data for similar languages within a common feature design. For West Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Tel-ugu, and Urdu we compare TTS voices trained only on monolingual data with voices trained on multilingual data from 12 languages. In subjective evaluations multilingually trained voices outperform (or in a few cases are statistically tied with) the corresponding monolingual voices. The multilingual setup can further be used to synthesize speech for languages not seen in the training data; preliminary evaluations lean towards good. Our results indicate that pooling data from different languages in a single acoustic model can be beneficial, opening up new uses and research questions.\",\"PeriodicalId\":190269,\"journal\":{\"name\":\"Workshop on Spoken Language Technologies for Under-resourced Languages\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Spoken Language Technologies for Under-resourced Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21437/SLTU.2018-17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Spoken Language Technologies for Under-resourced Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/SLTU.2018-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Unified Phonological Representation of South Asian Languages for Multilingual Text-to-Speech
We present a multilingual phoneme inventory and inclusion mappings from the native inventories of several major South Asian languages for multilingual parametric text-to-speech synthesis (TTS). Our goal is to reduce the need for training data when building new TTS voices by leveraging available data for similar languages within a common feature design. For West Bengali, Gujarati, Kannada, Malayalam, Marathi, Tamil, Tel-ugu, and Urdu we compare TTS voices trained only on monolingual data with voices trained on multilingual data from 12 languages. In subjective evaluations multilingually trained voices outperform (or in a few cases are statistically tied with) the corresponding monolingual voices. The multilingual setup can further be used to synthesize speech for languages not seen in the training data; preliminary evaluations lean towards good. Our results indicate that pooling data from different languages in a single acoustic model can be beneficial, opening up new uses and research questions.