Fast variable-frame-rate decoding of speech recognition based on deep neural networks

Ge Zhang, Pengyuan Zhang, Jielin Pan, Yonghong Yan
{"title":"Fast variable-frame-rate decoding of speech recognition based on deep neural networks","authors":"Ge Zhang, Pengyuan Zhang, Jielin Pan, Yonghong Yan","doi":"10.1109/FSKD.2017.8393381","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNN) have recently shown impressive performance as acoustic models for large vocabulary continuous speech recognition (LVCSR) tasks. Typically, the frame shift of the output of neural networks is much shorter than the average length of the modeling units, so the posterior vectors of neighbouring frames are likely to be similar. The similarity, together with the better discrimination of neural networks than typical acoustic models, shows a possibility of removing frames of neural network outputs according to the distance of posterior vectors. Then, the computation costs of beam searching can be effectively reduced. Based on that, the paper introduces a novel variable-frame-rate decoding approach based on neural network computation that accelerates the beam searching for speech recognition with minor loss of accuracy. By computing the distances of posterior vectors and removing frames with a posterior vector similar to the previous frame, the approach can make use of redundant information between frames and do a much quicker beam searching. Experiments on LVCSR tasks show a 2.4-times speed up of decoding compared to the typical framewise decoding implementation.","PeriodicalId":236093,"journal":{"name":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2017.8393381","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Deep neural networks (DNN) have recently shown impressive performance as acoustic models for large vocabulary continuous speech recognition (LVCSR) tasks. Typically, the frame shift of the output of neural networks is much shorter than the average length of the modeling units, so the posterior vectors of neighbouring frames are likely to be similar. The similarity, together with the better discrimination of neural networks than typical acoustic models, shows a possibility of removing frames of neural network outputs according to the distance of posterior vectors. Then, the computation costs of beam searching can be effectively reduced. Based on that, the paper introduces a novel variable-frame-rate decoding approach based on neural network computation that accelerates the beam searching for speech recognition with minor loss of accuracy. By computing the distances of posterior vectors and removing frames with a posterior vector similar to the previous frame, the approach can make use of redundant information between frames and do a much quicker beam searching. Experiments on LVCSR tasks show a 2.4-times speed up of decoding compared to the typical framewise decoding implementation.
基于深度神经网络的语音识别快速变帧率解码
深度神经网络(DNN)作为声学模型在大词汇量连续语音识别(LVCSR)任务中的表现令人印象深刻。通常,神经网络输出的帧移比建模单元的平均长度短得多,因此相邻帧的后验向量很可能相似。这种相似性,加上神经网络比典型声学模型具有更好的识别能力,表明了根据后验向量的距离去除神经网络输出帧的可能性。这样可以有效地降低波束搜索的计算量。在此基础上,提出了一种基于神经网络计算的变帧率解码方法,提高了语音识别的波束搜索速度,降低了识别精度。该方法通过计算后验向量的距离,并使用与前一帧相似的后验向量去除帧,利用帧之间的冗余信息,实现更快的波束搜索。在LVCSR任务上的实验表明,与典型的逐帧解码实现相比,解码速度提高了2.4倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信