Sahoko Nakayama, Andros Tjandra, S. Sakti, Satoshi Nakamura
{"title":"基于多语言机器语音链的零码切换ASR和TTS","authors":"Sahoko Nakayama, Andros Tjandra, S. Sakti, Satoshi Nakamura","doi":"10.1109/ASRU46091.2019.9003926","DOIUrl":null,"url":null,"abstract":"Constructing automatic speech recognition (ASR) and text-to-speech (TTS) for code-switching in a supervised fashion poses a challenge since a large amount of code-switching speech and the corresponding transcription are usually unavailable. The machine speech chain mechanism can be utilized to achieve semi-supervised learning. The framework enables ASR and TTS to assist each other when they receive unpaired data since it allows them to infer the missing pair and optimize the models with reconstruction loss. In this study, we handle multiple language pairs of code-switching by integrating language embeddings into the machine speech chain and investigate whether the model can perform with code-switching language pairs that are never explicitly seen during training. Experimental results reveal that the proposed approach improves the performance of the multilingual code-switching language pairs with which the model was trained and can also perform with unknown code-switching language pairs without directly learning on it.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Zero-Shot Code-Switching ASR and TTS with Multilingual Machine Speech Chain\",\"authors\":\"Sahoko Nakayama, Andros Tjandra, S. Sakti, Satoshi Nakamura\",\"doi\":\"10.1109/ASRU46091.2019.9003926\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Constructing automatic speech recognition (ASR) and text-to-speech (TTS) for code-switching in a supervised fashion poses a challenge since a large amount of code-switching speech and the corresponding transcription are usually unavailable. The machine speech chain mechanism can be utilized to achieve semi-supervised learning. The framework enables ASR and TTS to assist each other when they receive unpaired data since it allows them to infer the missing pair and optimize the models with reconstruction loss. In this study, we handle multiple language pairs of code-switching by integrating language embeddings into the machine speech chain and investigate whether the model can perform with code-switching language pairs that are never explicitly seen during training. Experimental results reveal that the proposed approach improves the performance of the multilingual code-switching language pairs with which the model was trained and can also perform with unknown code-switching language pairs without directly learning on it.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003926\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Zero-Shot Code-Switching ASR and TTS with Multilingual Machine Speech Chain
Constructing automatic speech recognition (ASR) and text-to-speech (TTS) for code-switching in a supervised fashion poses a challenge since a large amount of code-switching speech and the corresponding transcription are usually unavailable. The machine speech chain mechanism can be utilized to achieve semi-supervised learning. The framework enables ASR and TTS to assist each other when they receive unpaired data since it allows them to infer the missing pair and optimize the models with reconstruction loss. In this study, we handle multiple language pairs of code-switching by integrating language embeddings into the machine speech chain and investigate whether the model can perform with code-switching language pairs that are never explicitly seen during training. Experimental results reveal that the proposed approach improves the performance of the multilingual code-switching language pairs with which the model was trained and can also perform with unknown code-switching language pairs without directly learning on it.