Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition

Fanda Yuliana Putri, D. Lestari, D. H. Widyantoro
{"title":"Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition","authors":"Fanda Yuliana Putri, D. Lestari, D. H. Widyantoro","doi":"10.1109/IC3INA.2018.8629500","DOIUrl":null,"url":null,"abstract":"A robust recognition performance in daily or spontaneous conversation becomes necessary for a speech recognizer when deployed in real world applications. Meanwhile, the Indonesian speech recognition system (ASR) still has poor performance compared to dictated speech. In this work, we used deep neural networks approach, focused primarily on using long short-term memory (LSTM) to improve the language model performance as it has been successfully applied to many long context-dependent problems including language modeling. We tried different architectures and parameters to get the optimal combination, including deep LSTMs and LSTM with projection layer (LSTMP). Thereafter, different type of corpus was employed to enrich the language model linguistically. All our LSTM language models achieved significant improvement in terms of perplexity and word error rate (%WER) compared to n-gram as the baseline. The perplexity improvement was up to 50.6% and best WER reduction was 3.61% as evaluated with Triphone GMM- HMM acoustic model. The optimal architecture combination we got is deep LSTMP with L2 regularization.","PeriodicalId":179466,"journal":{"name":"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3INA.2018.8629500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

A robust recognition performance in daily or spontaneous conversation becomes necessary for a speech recognizer when deployed in real world applications. Meanwhile, the Indonesian speech recognition system (ASR) still has poor performance compared to dictated speech. In this work, we used deep neural networks approach, focused primarily on using long short-term memory (LSTM) to improve the language model performance as it has been successfully applied to many long context-dependent problems including language modeling. We tried different architectures and parameters to get the optimal combination, including deep LSTMs and LSTM with projection layer (LSTMP). Thereafter, different type of corpus was employed to enrich the language model linguistically. All our LSTM language models achieved significant improvement in terms of perplexity and word error rate (%WER) compared to n-gram as the baseline. The perplexity improvement was up to 50.6% and best WER reduction was 3.61% as evaluated with Triphone GMM- HMM acoustic model. The optimal architecture combination we got is deep LSTMP with L2 regularization.
基于长短期记忆的印尼语自发语音识别语言模型
在实际应用中部署语音识别器时,需要在日常或自发对话中具有强大的识别性能。与此同时,印尼语语音识别系统(ASR)与听写语音相比仍然表现不佳。在这项工作中,我们使用深度神经网络方法,主要关注使用长短期记忆(LSTM)来提高语言模型的性能,因为它已经成功地应用于许多长期上下文相关的问题,包括语言建模。我们尝试了不同的架构和参数来获得最优组合,包括深度LSTM和带投影层的LSTM (LSTMP)。随后,采用不同类型的语料库,从语言学上丰富语言模型。与n-gram作为基准相比,我们所有的LSTM语言模型在困惑度和单词错误率(%WER)方面都取得了显着改善。用Triphone GMM- HMM声学模型评价,混淆度改善达50.6%,最大的WER降低为3.61%。我们得到的最优结构组合是带L2正则化的深度LSTMP。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信