用于语音识别的简化LSTMS

G. Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, M. Picheny, Samuel Thomas
{"title":"用于语音识别的简化LSTMS","authors":"G. Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, M. Picheny, Samuel Thomas","doi":"10.1109/ASRU46091.2019.9003898","DOIUrl":null,"url":null,"abstract":"In this paper we explore new variants of Long Short-Term Memory (LSTM) networks for sequential modeling of acoustic features. In particular, we show that: (i) removing the output gate, (ii) replacing the hyperbolic tangent nonlinearity at the cell output with hard tanh, and (iii) collapsing the cell and hidden state vectors leads to a model that is conceptually simpler than and comparable in effectiveness to a regular LSTM for speech recognition. The proposed model has 25% fewer parameters than an LSTM with the same number of cells, trains faster because it has larger gradients leading to larger steps in weight space, and reaches a better optimum because there are fewer nonlinearities to traverse across layers. We report experimental results for both hybrid and CTC acoustic models on three publicly available English datasets: Switchboard 300 hours telephone conversations, 400 hours broadcast news transcription, and the MALACH 176 hours corpus of Holocaust survivor testimonies. In all cases the proposed models achieve similar or better accuracy than regular LSTMs while being conceptually simpler.","PeriodicalId":150913,"journal":{"name":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Simplified LSTMS for Speech Recognition\",\"authors\":\"G. Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, M. Picheny, Samuel Thomas\",\"doi\":\"10.1109/ASRU46091.2019.9003898\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we explore new variants of Long Short-Term Memory (LSTM) networks for sequential modeling of acoustic features. In particular, we show that: (i) removing the output gate, (ii) replacing the hyperbolic tangent nonlinearity at the cell output with hard tanh, and (iii) collapsing the cell and hidden state vectors leads to a model that is conceptually simpler than and comparable in effectiveness to a regular LSTM for speech recognition. The proposed model has 25% fewer parameters than an LSTM with the same number of cells, trains faster because it has larger gradients leading to larger steps in weight space, and reaches a better optimum because there are fewer nonlinearities to traverse across layers. We report experimental results for both hybrid and CTC acoustic models on three publicly available English datasets: Switchboard 300 hours telephone conversations, 400 hours broadcast news transcription, and the MALACH 176 hours corpus of Holocaust survivor testimonies. In all cases the proposed models achieve similar or better accuracy than regular LSTMs while being conceptually simpler.\",\"PeriodicalId\":150913,\"journal\":{\"name\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASRU46091.2019.9003898\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU46091.2019.9003898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在本文中,我们探索了用于声学特征序列建模的长短期记忆(LSTM)网络的新变体。特别是,我们表明:(i)去除输出门,(ii)用硬tanh替换单元输出处的双曲正切非线性,以及(iii)折叠单元和隐藏状态向量导致模型在概念上比用于语音识别的常规LSTM更简单,并且在有效性上与之相当。该模型的参数比具有相同单元数的LSTM少25%,训练速度更快,因为它具有更大的梯度,导致权重空间的步长更大,并且达到了更好的优化,因为跨层的非线性更少。我们报告了混合和CTC声学模型在三个公开可用的英语数据集上的实验结果:总机300小时电话交谈,400小时广播新闻转录,以及MALACH 176小时大屠杀幸存者证词语料。在所有情况下,所提出的模型在概念上更简单的同时实现了与常规lstm相似或更好的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Simplified LSTMS for Speech Recognition
In this paper we explore new variants of Long Short-Term Memory (LSTM) networks for sequential modeling of acoustic features. In particular, we show that: (i) removing the output gate, (ii) replacing the hyperbolic tangent nonlinearity at the cell output with hard tanh, and (iii) collapsing the cell and hidden state vectors leads to a model that is conceptually simpler than and comparable in effectiveness to a regular LSTM for speech recognition. The proposed model has 25% fewer parameters than an LSTM with the same number of cells, trains faster because it has larger gradients leading to larger steps in weight space, and reaches a better optimum because there are fewer nonlinearities to traverse across layers. We report experimental results for both hybrid and CTC acoustic models on three publicly available English datasets: Switchboard 300 hours telephone conversations, 400 hours broadcast news transcription, and the MALACH 176 hours corpus of Holocaust survivor testimonies. In all cases the proposed models achieve similar or better accuracy than regular LSTMs while being conceptually simpler.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信