B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan
{"title":"基于语音转换的多语言到多语言语音合成器,用于印度语言","authors":"B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan","doi":"10.1109/TENCON.2013.6719019","DOIUrl":null,"url":null,"abstract":"A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.","PeriodicalId":425023,"journal":{"name":"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages\",\"authors\":\"B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan\",\"doi\":\"10.1109/TENCON.2013.6719019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.\",\"PeriodicalId\":425023,\"journal\":{\"name\":\"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)\",\"volume\":\"171 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TENCON.2013.6719019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference of IEEE Region 10 (TENCON 2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON.2013.6719019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Voice conversion-based multilingual to polyglot speech synthesizer for Indian languages
A multilingual text-to-speech (TTS) system synthesizes speech signal in multiple languages for a given text, that is intelligible to human listener. However, given a mixed language text to the system, the synthesized output is observed to have speaker switching at the language switching points, which is annoying to the listeners. To overcome this switching effect, a polyglot speech synthesizer is developed, which generates synthesized speech in multiple languages with single voice identity. This can be achieved by inherent voice conversion during synthesis or by using voice conversion to convert the multilingual speech corpus to polyglot speech corpus and then perform synthesis. In this work, the polyglot speech corpus is obtained using Gaussian mixture model (GMM)-based cross-lingual voice conversion technique and a polyglot speech synthesizer for Indian languages is developed using hidden Markov model (HMM)- based synthesis technique. Here, the speech data collected from the native speakers for the Indian languages namely, Telugu, Malayalam, and Hindi are converted to have the voice identity of the native Tamil speaker. Building a HMM-based synthesizer using the obtained polyglot corpus enables the system to synthesize speech for any given text in any language or mixed language text. The performance of the polyglot speech synthesizer is evaluated for the similarity of the synthesized speech to the source or target speaker by performing ABX listening test. The scores obtained shows that the percentage of similarity to the target Tamil speaker varies from 73% to 86%. Further the performance of the system is analyzed for speaker switching.