Non-Autoregressive Speech Recognition with Error Correction Module

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) Pub Date : 2022-11-07 DOI:10.23919/APSIPAASC55919.2022.9980031

Yukun Qian, Xuyi Zhuang, Zehua Zhang, Lianyu Zhou, Xu Lin, Mingjiang Wang

{"title":"Non-Autoregressive Speech Recognition with Error Correction Module","authors":"Yukun Qian, Xuyi Zhuang, Zehua Zhang, Lianyu Zhou, Xu Lin, Mingjiang Wang","doi":"10.23919/APSIPAASC55919.2022.9980031","DOIUrl":null,"url":null,"abstract":"Autoregressive models have achieved good performance in the field of speech recognition. However, the autore-gressive model uses recursive decoding and beam search in the inference stage, which leads to its slow inference speed. On the other hand, the non-autoregressive model naturally cannot utilize the context since all tokens are output at one time. To solve this problem, we propose a position-dependent non-autoregressive model. And in order to make better use of contextual information, we propose a pre-trained language model for speech recognition, which is placed behind the non-autoregressive model as an error correction module. In this way, we exchanged a smaller amount of calculation for the improvement of the recognition rate. Our method not only greatly reduces the computational cost, but also maintains a good recognition rate. We tested our model on the public Chinese speech corpus AISHELL-1. Our model achieves a 6.5% character error rate while the real-time factor is only 0.0022, which is 1/17 of the autoregressive model.","PeriodicalId":382967,"journal":{"name":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPAASC55919.2022.9980031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Autoregressive models have achieved good performance in the field of speech recognition. However, the autore-gressive model uses recursive decoding and beam search in the inference stage, which leads to its slow inference speed. On the other hand, the non-autoregressive model naturally cannot utilize the context since all tokens are output at one time. To solve this problem, we propose a position-dependent non-autoregressive model. And in order to make better use of contextual information, we propose a pre-trained language model for speech recognition, which is placed behind the non-autoregressive model as an error correction module. In this way, we exchanged a smaller amount of calculation for the improvement of the recognition rate. Our method not only greatly reduces the computational cost, but also maintains a good recognition rate. We tested our model on the public Chinese speech corpus AISHELL-1. Our model achieves a 6.5% character error rate while the real-time factor is only 0.0022, which is 1/17 of the autoregressive model.

查看原文本刊更多论文

带纠错模块的非自回归语音识别

自回归模型在语音识别领域取得了良好的效果。然而，自回归模型在推理阶段使用递归解码和波束搜索，导致其推理速度较慢。另一方面，非自回归模型自然不能利用上下文，因为所有标记都是一次输出的。为了解决这个问题，我们提出了一个位置相关的非自回归模型。为了更好地利用上下文信息，我们提出了一种用于语音识别的预训练语言模型，该模型作为纠错模块放置在非自回归模型后面。这样，我们以更少的计算量换取了识别率的提高。该方法不仅大大降低了计算成本，而且保持了良好的识别率。我们在公共汉语语音语料库AISHELL-1上测试了我们的模型。我们的模型实现了6.5%的字符错误率，而实时因子仅为0.0022，是自回归模型的1/17。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

自引率

0.00%

发文量