{"title":"评估南非语言自动语音识别的开源工具包","authors":"Ashentha Naidoo, M. Tsoeu","doi":"10.1109/ROBOMECH.2019.8704774","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition is a critical component of human language technologies. It concerns the translation of speech into textual data which can be processed by computers. Thus, it offers the creation of an intimate link allowing humans to interact with machines on a completely natural level. A variety of open-source toolkits exist for the development of these systems. These toolkits have been successfully implemented and tested for use on well-resourced languages. However, the same level of testing has not been performed for South African languages. This investigation sets out to evaluate popular open-source tools for South African languages and identify optimal toolkit configurations for each language and toolkit. The NCHLT corpora were used to set up automatic speech recognition systems for English and isiXhosa using Kaldi, CMU Sphinx, and HTK. The word error rates achieved during this investigation showed that the best configurations from this investigation achieved better performance than those which were reported by the developers of the NCHLT corpus.","PeriodicalId":344332,"journal":{"name":"2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA)","volume":"198 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Evaluating Open-source Toolkits for Automatic Speech Recognition of South African Languages\",\"authors\":\"Ashentha Naidoo, M. Tsoeu\",\"doi\":\"10.1109/ROBOMECH.2019.8704774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic speech recognition is a critical component of human language technologies. It concerns the translation of speech into textual data which can be processed by computers. Thus, it offers the creation of an intimate link allowing humans to interact with machines on a completely natural level. A variety of open-source toolkits exist for the development of these systems. These toolkits have been successfully implemented and tested for use on well-resourced languages. However, the same level of testing has not been performed for South African languages. This investigation sets out to evaluate popular open-source tools for South African languages and identify optimal toolkit configurations for each language and toolkit. The NCHLT corpora were used to set up automatic speech recognition systems for English and isiXhosa using Kaldi, CMU Sphinx, and HTK. The word error rates achieved during this investigation showed that the best configurations from this investigation achieved better performance than those which were reported by the developers of the NCHLT corpus.\",\"PeriodicalId\":344332,\"journal\":{\"name\":\"2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA)\",\"volume\":\"198 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROBOMECH.2019.8704774\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Southern African Universities Power Engineering Conference/Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROBOMECH.2019.8704774","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Evaluating Open-source Toolkits for Automatic Speech Recognition of South African Languages
Automatic speech recognition is a critical component of human language technologies. It concerns the translation of speech into textual data which can be processed by computers. Thus, it offers the creation of an intimate link allowing humans to interact with machines on a completely natural level. A variety of open-source toolkits exist for the development of these systems. These toolkits have been successfully implemented and tested for use on well-resourced languages. However, the same level of testing has not been performed for South African languages. This investigation sets out to evaluate popular open-source tools for South African languages and identify optimal toolkit configurations for each language and toolkit. The NCHLT corpora were used to set up automatic speech recognition systems for English and isiXhosa using Kaldi, CMU Sphinx, and HTK. The word error rates achieved during this investigation showed that the best configurations from this investigation achieved better performance than those which were reported by the developers of the NCHLT corpus.