Zhirong Wang, Umut Topkara, Tanja Schultz, A. Waibel
{"title":"走向通用语音识别","authors":"Zhirong Wang, Umut Topkara, Tanja Schultz, A. Waibel","doi":"10.1109/ICMI.2002.1167001","DOIUrl":null,"url":null,"abstract":"The increasing interest in multilingual applications like speech-to-speech translation systems is accompanied by the need for speech recognition front-ends in many languages that can also handle multiple input languages at the same time. We describe a universal speech recognition system that fulfills such needs. It is trained by sharing speech and text data across languages and thus reduces the number of parameters and overhead significantly at the cost of only slight accuracy loss. The final recognizer eases the burden of maintaining several monolingual engines, makes dedicated language identification obsolete and allows for code-switching within an utterance. To achieve these goals we developed new methods for constructing multilingual acoustic models and multilingual n-gram language models.","PeriodicalId":208377,"journal":{"name":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":"{\"title\":\"Towards universal speech recognition\",\"authors\":\"Zhirong Wang, Umut Topkara, Tanja Schultz, A. Waibel\",\"doi\":\"10.1109/ICMI.2002.1167001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing interest in multilingual applications like speech-to-speech translation systems is accompanied by the need for speech recognition front-ends in many languages that can also handle multiple input languages at the same time. We describe a universal speech recognition system that fulfills such needs. It is trained by sharing speech and text data across languages and thus reduces the number of parameters and overhead significantly at the cost of only slight accuracy loss. The final recognizer eases the burden of maintaining several monolingual engines, makes dedicated language identification obsolete and allows for code-switching within an utterance. To achieve these goals we developed new methods for constructing multilingual acoustic models and multilingual n-gram language models.\",\"PeriodicalId\":208377,\"journal\":{\"name\":\"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2002-10-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"34\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMI.2002.1167001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fourth IEEE International Conference on Multimodal Interfaces","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMI.2002.1167001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The increasing interest in multilingual applications like speech-to-speech translation systems is accompanied by the need for speech recognition front-ends in many languages that can also handle multiple input languages at the same time. We describe a universal speech recognition system that fulfills such needs. It is trained by sharing speech and text data across languages and thus reduces the number of parameters and overhead significantly at the cost of only slight accuracy loss. The final recognizer eases the burden of maintaining several monolingual engines, makes dedicated language identification obsolete and allows for code-switching within an utterance. To achieve these goals we developed new methods for constructing multilingual acoustic models and multilingual n-gram language models.