Simple Gated Convnet for Small Footprint Acoustic Modeling

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) Pub Date : 2019-12-01 DOI:10.1109/ASRU46091.2019.9003993

Lukas Lee, Jinhwan Park, Wonyong Sung

{"title":"Simple Gated Convnet for Small Footprint Acoustic Modeling","authors":"Lukas Lee, Jinhwan Park, Wonyong Sung","doi":"10.1109/ASRU46091.2019.9003993","DOIUrl":null,"url":null,"abstract":"Acoustic modeling with recurrent neural networks has shown very good performance, especially for end-to-end speech recognition. However, most recurrent neural networks require sequential computation of the output, which results in large memory access overhead when implemented in embedded devices. Convolution-based sequential modeling does not suffer from this problem; however, the model usually requires a large number of parameters. We propose simple gated convolutional neural networks (Simple Gated ConvNet) for acoustic modeling and show that the network performs very well even when the number of parameters is fairly small, less than 3 million. The Simple Gated ConvNet (SGCN) is constructed by combining the simplest form of Gated ConvNet and one-dimensional (1-D) depthwise convolution. The model has been evaluated using the Wall Street Journal (WSJ) Corpus and has shown a performance competitive to RNN-based ones. The performance of the SGCN has also been evaluated using the LibriSpeech Corpus. The developed model was implemented in ARM CPU based systems and showed the real time factor (RTF) of around 0.05.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Acoustic modeling with recurrent neural networks has shown very good performance, especially for end-to-end speech recognition. However, most recurrent neural networks require sequential computation of the output, which results in large memory access overhead when implemented in embedded devices. Convolution-based sequential modeling does not suffer from this problem; however, the model usually requires a large number of parameters. We propose simple gated convolutional neural networks (Simple Gated ConvNet) for acoustic modeling and show that the network performs very well even when the number of parameters is fairly small, less than 3 million. The Simple Gated ConvNet (SGCN) is constructed by combining the simplest form of Gated ConvNet and one-dimensional (1-D) depthwise convolution. The model has been evaluated using the Wall Street Journal (WSJ) Corpus and has shown a performance competitive to RNN-based ones. The performance of the SGCN has also been evaluated using the LibriSpeech Corpus. The developed model was implemented in ARM CPU based systems and showed the real time factor (RTF) of around 0.05.

查看原文本刊更多论文

用于小足迹声学建模的简单门控Convnet

基于递归神经网络的声学建模在端到端语音识别方面表现出了良好的性能。然而，大多数递归神经网络需要对输出进行顺序计算，这导致在嵌入式设备中实现时内存访问开销很大。基于卷积的序列建模没有这个问题;然而，该模型通常需要大量的参数。我们提出了简单门控卷积神经网络(simple gated ConvNet)用于声学建模，并表明即使在参数数量相当小(小于300万个)的情况下，该网络也表现得非常好。简单门控卷积网络(SGCN)是将门控卷积网络的最简单形式与一维深度卷积相结合而构造的。该模型已使用《华尔街日报》语料库进行了评估，并显示出与基于rnn的模型相竞争的性能。SGCN的性能也使用librisspeech语料库进行了评估。所开发的模型在基于ARM CPU的系统中实现，实时因子(RTF)约为0.05。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

自引率

0.00%

发文量