Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846306

Ming Sun, A. Raju, G. Tucker, S. Panchapagesan, Gengshen Fu, Arindam Mandal, S. Matsoukas, N. Strom, S. Vitaladevuni

引用次数: 109

Abstract

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields 67:6% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.

查看原文本刊更多论文

长短期记忆网络的最大池损失训练用于小内存占用关键字识别

我们提出了一个基于最大池的损失函数，用于训练长短期记忆(LSTM)网络，用于小占用的关键字定位(KWS)，具有低CPU，内存和延迟要求。通过交叉熵损失训练网络的初始化，可以进一步指导最大池化损失训练。采用一种基于后验平滑的评价方法来衡量关键词识别性能。我们的实验结果表明，使用交叉熵损失或最大池化损失训练的LSTM模型优于交叉熵损失训练的基线前馈深度神经网络(DNN)。此外，随机初始化网络的最大池损失训练LSTM比交叉熵损失训练LSTM性能更好。最后，用交叉熵预训练网络初始化的最大池损失训练LSTM表现出最好的性能，在曲线下面积(Area Under the Curve, AUC)测量中，与基线前馈深度神经网络相比，其相对降低了67:6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量