Fast variable-frame-rate decoding of speech recognition based on deep neural networks

2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) Pub Date : 2017-07-29 DOI:10.1109/FSKD.2017.8393381

Ge Zhang, Pengyuan Zhang, Jielin Pan, Yonghong Yan

{"title":"Fast variable-frame-rate decoding of speech recognition based on deep neural networks","authors":"Ge Zhang, Pengyuan Zhang, Jielin Pan, Yonghong Yan","doi":"10.1109/FSKD.2017.8393381","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) have recently shown impressive performance as acoustic models for large vocabulary continuous speech recognition (LVCSR) tasks. Typically, the frame shift of the output of neural networks is much shorter than the average length of the modeling units, so the posterior vectors of neighbouring frames are likely to be similar. The similarity, together with the better discrimination of neural networks than typical acoustic models, shows a possibility of removing frames of neural network outputs according to the distance of posterior vectors. Then, the computation costs of beam searching can be effectively reduced. Based on that, the paper introduces a novel variable-frame-rate decoding approach based on neural network computation that accelerates the beam searching for speech recognition with minor loss of accuracy. By computing the distances of posterior vectors and removing frames with a posterior vector similar to the previous frame, the approach can make use of redundant information between frames and do a much quicker beam searching. Experiments on LVCSR tasks show a 2.4-times speed up of decoding compared to the typical framewise decoding implementation.","PeriodicalId":236093,"journal":{"name":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2017.8393381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep neural networks (DNN) have recently shown impressive performance as acoustic models for large vocabulary continuous speech recognition (LVCSR) tasks. Typically, the frame shift of the output of neural networks is much shorter than the average length of the modeling units, so the posterior vectors of neighbouring frames are likely to be similar. The similarity, together with the better discrimination of neural networks than typical acoustic models, shows a possibility of removing frames of neural network outputs according to the distance of posterior vectors. Then, the computation costs of beam searching can be effectively reduced. Based on that, the paper introduces a novel variable-frame-rate decoding approach based on neural network computation that accelerates the beam searching for speech recognition with minor loss of accuracy. By computing the distances of posterior vectors and removing frames with a posterior vector similar to the previous frame, the approach can make use of redundant information between frames and do a much quicker beam searching. Experiments on LVCSR tasks show a 2.4-times speed up of decoding compared to the typical framewise decoding implementation.

查看原文本刊更多论文

基于深度神经网络的语音识别快速变帧率解码

深度神经网络(DNN)作为声学模型在大词汇量连续语音识别(LVCSR)任务中的表现令人印象深刻。通常，神经网络输出的帧移比建模单元的平均长度短得多，因此相邻帧的后验向量很可能相似。这种相似性，加上神经网络比典型声学模型具有更好的识别能力，表明了根据后验向量的距离去除神经网络输出帧的可能性。这样可以有效地降低波束搜索的计算量。在此基础上，提出了一种基于神经网络计算的变帧率解码方法，提高了语音识别的波束搜索速度，降低了识别精度。该方法通过计算后验向量的距离，并使用与前一帧相似的后验向量去除帧，利用帧之间的冗余信息，实现更快的波束搜索。在LVCSR任务上的实验表明，与典型的逐帧解码实现相比，解码速度提高了2.4倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

自引率

0.00%

发文量