{"title":"Error Correction Based on Transformer LM in Uyghur Speech Recognition","authors":"Yan Zhang, Mijit Ablimit, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520740","DOIUrl":null,"url":null,"abstract":"For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"45 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.