M. Gales, Frank Diehl, C. Raut, M. Tomalin, P. Woodland, Kai Yu
{"title":"大词汇量阿拉伯语语音识别语音系统的开发","authors":"M. Gales, Frank Diehl, C. Raut, M. Tomalin, P. Woodland, Kai Yu","doi":"10.1109/ASRU.2007.4430078","DOIUrl":null,"url":null,"abstract":"This paper describes the development of an Arabic speech recognition system based on a phonetic dictionary. Though phonetic systems have been previously investigated, this paper makes a number of contributions to the understanding of how to build these systems, as well as describing a complete Arabic speech recognition system. The first issue considered is discriminative training when there are a large number of pronunciation variants for each word. In particular, the loss function associated with minimum phone error (MPE) training is examined. The performance and combination of phonetic and graphemic acoustic models are then compared on both Broadcast News (BN) and Broadcast Conversation (BC) data. The final contribution of the paper is a simple scheme for automatically generating pronunciations for use in training and reducing the phonetic out-of-vocabulary rate. The paper concludes with a description and results from using phonetic and graphemic systems in a multipass/combination framework.","PeriodicalId":371729,"journal":{"name":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Development of a phonetic system for large vocabulary Arabic speech recognition\",\"authors\":\"M. Gales, Frank Diehl, C. Raut, M. Tomalin, P. Woodland, Kai Yu\",\"doi\":\"10.1109/ASRU.2007.4430078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes the development of an Arabic speech recognition system based on a phonetic dictionary. Though phonetic systems have been previously investigated, this paper makes a number of contributions to the understanding of how to build these systems, as well as describing a complete Arabic speech recognition system. The first issue considered is discriminative training when there are a large number of pronunciation variants for each word. In particular, the loss function associated with minimum phone error (MPE) training is examined. The performance and combination of phonetic and graphemic acoustic models are then compared on both Broadcast News (BN) and Broadcast Conversation (BC) data. The final contribution of the paper is a simple scheme for automatically generating pronunciations for use in training and reducing the phonetic out-of-vocabulary rate. The paper concludes with a description and results from using phonetic and graphemic systems in a multipass/combination framework.\",\"PeriodicalId\":371729,\"journal\":{\"name\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU.2007.4430078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2007.4430078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Development of a phonetic system for large vocabulary Arabic speech recognition
This paper describes the development of an Arabic speech recognition system based on a phonetic dictionary. Though phonetic systems have been previously investigated, this paper makes a number of contributions to the understanding of how to build these systems, as well as describing a complete Arabic speech recognition system. The first issue considered is discriminative training when there are a large number of pronunciation variants for each word. In particular, the loss function associated with minimum phone error (MPE) training is examined. The performance and combination of phonetic and graphemic acoustic models are then compared on both Broadcast News (BN) and Broadcast Conversation (BC) data. The final contribution of the paper is a simple scheme for automatically generating pronunciations for use in training and reducing the phonetic out-of-vocabulary rate. The paper concludes with a description and results from using phonetic and graphemic systems in a multipass/combination framework.