基于变换LM的维吾尔语语音识别纠错方法

Yan Zhang, Mijit Ablimit, A. Hamdulla
{"title":"基于变换LM的维吾尔语语音识别纠错方法","authors":"Yan Zhang, Mijit Ablimit, A. Hamdulla","doi":"10.1109/PRML52754.2021.9520740","DOIUrl":null,"url":null,"abstract":"For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.","PeriodicalId":429603,"journal":{"name":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","volume":"45 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Error Correction Based on Transformer LM in Uyghur Speech Recognition\",\"authors\":\"Yan Zhang, Mijit Ablimit, A. Hamdulla\",\"doi\":\"10.1109/PRML52754.2021.9520740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.\",\"PeriodicalId\":429603,\"journal\":{\"name\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"volume\":\"45 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PRML52754.2021.9520740\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRML52754.2021.9520740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

对于维吾尔语、哈萨克语等少数民族语言或方言,很难收集到大规模标注语料库。在资源较低的情况下,降低以音素或字符为识别单元的识别粒度,可以获得较好的字符识别率,但无法直观地充分利用词间信息,无法解决实际过程中错误率高的问题。为了纠正识别中的错误单词,本文提出使用Levenshtein距离和以单词为建模单位的Transformer语言模型作为次级评分标准来纠正端到端识别结果。在采用Conformer-CTC声学模型的维吾尔语端到端识别中,WER降低了5.7%,在采用BLSTM-CTC声学模型的端到端识别中,WER降低了9.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Error Correction Based on Transformer LM in Uyghur Speech Recognition
For Uyghur, Kazakh and other minority languages or dialects, it is difficult to collect large-scale labeled corpus. In the case of low resources, reducing the recognition granularity which using phonemes or characters as the recognition unit can get good character recognition rate, but the information between words is not fully utilized intuitively, which can not solve the problem of high word error rate in the practical process. In order to correct the wrong words in the recognition, this paper proposes to use Levenshtein distance and Transformer language model with words as modeling units as the secondary scoring criteria to correct the end-to-end recognition results. In the Uyghur end-to-end recognition deployed with Conformer-CTC acoustic model, the WER decreases by 5.7%, In the end-to-end recognition deployed with BLSTM-CTC as acoustic model, it decreased by 9.1%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信