On-Device End-to-end Speech Recognition with Multi-Step Parallel Rnns

2018 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2018-12-01 DOI:10.1109/SLT.2018.8639662

Yoonho Boo, Jinhwan Park, Lukas Lee, Wonyong Sung

引用次数: 0

Abstract

Most of the current automatic speech recognition is performed on a remote server. However, the demand for speech recognition on personal devices is increasing, owing to the requirement of shorter recognition latency and increased privacy. End-to-end speech recognition that employs recurrent neural networks (RNNs) shows good accuracy, but the execution of conventional RNNs, such as the long short-term memory (LSTM) or gated recurrent unit (GRU), demands many memory accesses, thus hindering its real-time execution on smart-phones or embedded systems. To solve this problem, we built an end-to-end acoustic model (AM) using linear recurrent units instead of LSTM or GRU and employed a multi-step parallel approach for reducing the number of DRAM accesses. The AM is trained with the connectionist temporal classification (CTC) loss, and the decoding is conducted using weighted finite-state transducers (WFSTs). The proposed system achieves x4.8 real-time speed when executed on a single core of an ARM CPU-based system.

查看原文本刊更多论文

基于多步并行rns的设备端到端语音识别

目前大多数自动语音识别都是在远程服务器上执行的。然而，由于需要更短的识别延迟和更高的隐私性，个人设备对语音识别的需求正在增加。采用递归神经网络(rnn)的端到端语音识别显示出良好的准确性，但传统rnn的执行，如长短期记忆(LSTM)或门控递归单元(GRU)，需要大量的内存访问，从而阻碍了其在智能手机或嵌入式系统上的实时执行。为了解决这个问题，我们使用线性循环单元代替LSTM或GRU建立了端到端声学模型(AM)，并采用多步并行方法来减少DRAM访问次数。使用连接时间分类(CTC)损失对调幅进行训练，并使用加权有限状态换能器(WFSTs)进行解码。当在基于ARM cpu的系统的单核上执行时，所提出的系统实现了x4.8的实时速度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量