基于变换LM的维吾尔语语音识别纠错方法

2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML) Pub Date : 2021-07-16 DOI:10.1109/PRML52754.2021.9520740

Yan Zhang, Mijit Ablimit, A. Hamdulla

{"title":"基于变换LM的维吾尔语语音识别纠错方法","authors":"Yan Zhang, Mijit Ablimit, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520740","DOIUrl":null,"url":null,"abstract":"For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"45 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Error Correction Based on Transformer LM in Uyghur Speech Recognition\",\"authors\":\"Yan Zhang, Mijit Ablimit, A. Hamdulla\",\"doi\":\"10.1109/PRML52754.2021.9520740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.\",\"PeriodicalId\":429603,\"journal\":{\"name\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"volume\":\"45 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRML52754.2021.9520740\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

对于维吾尔语、哈萨克语等少数民族语言或方言，很难收集到大规模标注语料库。在资源较低的情况下，降低以音素或字符为识别单元的识别粒度，可以获得较好的字符识别率，但无法直观地充分利用词间信息，无法解决实际过程中错误率高的问题。为了纠正识别中的错误单词，本文提出使用Levenshtein距离和以单词为建模单位的Transformer语言模型作为次级评分标准来纠正端到端识别结果。在采用Conformer-CTC声学模型的维吾尔语端到端识别中，WER降低了5.7%，在采用BLSTM-CTC声学模型的端到端识别中，WER降低了9.1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Error Correction Based on Transformer LM in Uyghur Speech Recognition

For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)

自引率

0.00%

发文量