基于RNN-T的哈萨克语语音识别端到端模型

O. Mamyrbayev, Dina Oralbekova, A. Kydyrbekova, Tolganay Turdalykyzy, A. Bekarystankyzy
{"title":"基于RNN-T的哈萨克语语音识别端到端模型","authors":"O. Mamyrbayev, Dina Oralbekova, A. Kydyrbekova, Tolganay Turdalykyzy, A. Bekarystankyzy","doi":"10.1109/ICCCI51764.2021.9486811","DOIUrl":null,"url":null,"abstract":"Automatic speech recognition is a rapidly developing area in machine learning. The most popular speech recognition systems today are end-to-end systems, especially those models that directly output a sequence of words taking into account the input sound in real time, which are online end-to-end models. Stream speech recognition allows to transfer the audio stream to speech-to-text conversion and get the results of stream speech recognition in real time as the audio is processed. This article discusses and implements a popular RNN-T-based model for recognizing Kazakh speech. The analysis of works related to recognition of Kazakh speech based on the CTC model is also given. The findings demonstrated that an RNN-T-based model can work well without additional components, like a language model and showed the best outcome on our dataset. As a result of the research, the system reached 10.6% CER, which is the best indicator among other end-to-end systems for recognizing Kazakh speech.","PeriodicalId":180004,"journal":{"name":"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"End-to-End Model Based on RNN-T for Kazakh Speech Recognition\",\"authors\":\"O. Mamyrbayev, Dina Oralbekova, A. Kydyrbekova, Tolganay Turdalykyzy, A. Bekarystankyzy\",\"doi\":\"10.1109/ICCCI51764.2021.9486811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic speech recognition is a rapidly developing area in machine learning. The most popular speech recognition systems today are end-to-end systems, especially those models that directly output a sequence of words taking into account the input sound in real time, which are online end-to-end models. Stream speech recognition allows to transfer the audio stream to speech-to-text conversion and get the results of stream speech recognition in real time as the audio is processed. This article discusses and implements a popular RNN-T-based model for recognizing Kazakh speech. The analysis of works related to recognition of Kazakh speech based on the CTC model is also given. The findings demonstrated that an RNN-T-based model can work well without additional components, like a language model and showed the best outcome on our dataset. As a result of the research, the system reached 10.6% CER, which is the best indicator among other end-to-end systems for recognizing Kazakh speech.\",\"PeriodicalId\":180004,\"journal\":{\"name\":\"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCCI51764.2021.9486811\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Conference on Computer Communication and the Internet (ICCCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCI51764.2021.9486811","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

自动语音识别是机器学习中一个快速发展的领域。当今最流行的语音识别系统是端到端系统,特别是那些直接输出单词序列的模型,考虑到实时输入的声音,这是在线端到端模型。流语音识别允许将音频流转换为语音到文本的转换,并在音频被处理的同时实时得到流语音识别的结果。本文讨论并实现了一种流行的基于rnn的哈萨克语语音识别模型。本文还对基于CTC模型的哈萨克语语音识别相关工作进行了分析。研究结果表明,基于rnn的模型可以在没有额外组件(如语言模型)的情况下很好地工作,并在我们的数据集上显示出最佳结果。研究结果表明,该系统的识别率达到10.6%,是其他端到端系统中识别哈萨克语语音的最佳指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
End-to-End Model Based on RNN-T for Kazakh Speech Recognition
Automatic speech recognition is a rapidly developing area in machine learning. The most popular speech recognition systems today are end-to-end systems, especially those models that directly output a sequence of words taking into account the input sound in real time, which are online end-to-end models. Stream speech recognition allows to transfer the audio stream to speech-to-text conversion and get the results of stream speech recognition in real time as the audio is processed. This article discusses and implements a popular RNN-T-based model for recognizing Kazakh speech. The analysis of works related to recognition of Kazakh speech based on the CTC model is also given. The findings demonstrated that an RNN-T-based model can work well without additional components, like a language model and showed the best outcome on our dataset. As a result of the research, the system reached 10.6% CER, which is the best indicator among other end-to-end systems for recognizing Kazakh speech.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信