长短期记忆网络的最大池损失训练用于小内存占用关键字识别

2016 IEEE Spoken Language Technology Workshop (SLT) Pub Date : 2016-12-01 DOI:10.1109/SLT.2016.7846306

Ming Sun, A. Raju, G. Tucker, S. Panchapagesan, Gengshen Fu, Arindam Mandal, S. Matsoukas, N. Strom, S. Vitaladevuni

{"title":"长短期记忆网络的最大池损失训练用于小内存占用关键字识别","authors":"Ming Sun, A. Raju, G. Tucker, S. Panchapagesan, Gengshen Fu, Arindam Mandal, S. Matsoukas, N. Strom, S. Vitaladevuni","doi":"10.1109/SLT.2016.7846306","DOIUrl":null,"url":null,"abstract":"We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields 67:6% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.","PeriodicalId":281635,"journal":{"name":"2016 IEEE Spoken Language Technology Workshop (SLT)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"109","resultStr":"{\"title\":\"Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting\",\"authors\":\"Ming Sun, A. Raju, G. Tucker, S. Panchapagesan, Gengshen Fu, Arindam Mandal, S. Matsoukas, N. Strom, S. Vitaladevuni\",\"doi\":\"10.1109/SLT.2016.7846306\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields 67:6% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.\",\"PeriodicalId\":281635,\"journal\":{\"name\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"109\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2016.7846306\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2016.7846306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 109

摘要

我们提出了一个基于最大池的损失函数，用于训练长短期记忆(LSTM)网络，用于小占用的关键字定位(KWS)，具有低CPU，内存和延迟要求。通过交叉熵损失训练网络的初始化，可以进一步指导最大池化损失训练。采用一种基于后验平滑的评价方法来衡量关键词识别性能。我们的实验结果表明，使用交叉熵损失或最大池化损失训练的LSTM模型优于交叉熵损失训练的基线前馈深度神经网络(DNN)。此外，随机初始化网络的最大池损失训练LSTM比交叉熵损失训练LSTM性能更好。最后，用交叉熵预训练网络初始化的最大池损失训练LSTM表现出最好的性能，在曲线下面积(Area Under the Curve, AUC)测量中，与基线前馈深度神经网络相比，其相对降低了67:6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting

We propose a max-pooling based loss function for training Long Short-Term Memory (LSTM) networks for small-footprint keyword spotting (KWS), with low CPU, memory, and latency requirements. The max-pooling loss training can be further guided by initializing with a cross-entropy loss trained network. A posterior smoothing based evaluation approach is employed to measure keyword spotting performance. Our experimental results show that LSTM models trained using cross-entropy loss or max-pooling loss outperform a cross-entropy loss trained baseline feed-forward Deep Neural Network (DNN). In addition, max-pooling loss trained LSTM with randomly initialized network performs better compared to cross-entropy loss trained LSTM. Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields 67:6% relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE Spoken Language Technology Workshop (SLT)

自引率

0.00%

发文量