Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition

2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA) Pub Date : 2018-11-01 DOI:10.1109/IC3INA.2018.8629500

Fanda Yuliana Putri, D. Lestari, D. H. Widyantoro

{"title":"Long Short-Term Memory Based Language Model for Indonesian Spontaneous Speech Recognition","authors":"Fanda Yuliana Putri, D. Lestari, D. H. Widyantoro","doi":"10.1109/IC3INA.2018.8629500","DOIUrl":null,"url":null,"abstract":"A robust recognition performance in daily or spontaneous conversation becomes necessary for a speech recognizer when deployed in real world applications. Meanwhile, the Indonesian speech recognition system (ASR) still has poor performance compared to dictated speech. In this work, we used deep neural networks approach, focused primarily on using long short-term memory (LSTM) to improve the language model performance as it has been successfully applied to many long context-dependent problems including language modeling. We tried different architectures and parameters to get the optimal combination, including deep LSTMs and LSTM with projection layer (LSTMP). Thereafter, different type of corpus was employed to enrich the language model linguistically. All our LSTM language models achieved significant improvement in terms of perplexity and word error rate (%WER) compared to n-gram as the baseline. The perplexity improvement was up to 50.6% and best WER reduction was 3.61% as evaluated with Triphone GMM- HMM acoustic model. The optimal architecture combination we got is deep LSTMP with L2 regularization.","PeriodicalId":179466,"journal":{"name":"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC3INA.2018.8629500","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

A robust recognition performance in daily or spontaneous conversation becomes necessary for a speech recognizer when deployed in real world applications. Meanwhile, the Indonesian speech recognition system (ASR) still has poor performance compared to dictated speech. In this work, we used deep neural networks approach, focused primarily on using long short-term memory (LSTM) to improve the language model performance as it has been successfully applied to many long context-dependent problems including language modeling. We tried different architectures and parameters to get the optimal combination, including deep LSTMs and LSTM with projection layer (LSTMP). Thereafter, different type of corpus was employed to enrich the language model linguistically. All our LSTM language models achieved significant improvement in terms of perplexity and word error rate (%WER) compared to n-gram as the baseline. The perplexity improvement was up to 50.6% and best WER reduction was 3.61% as evaluated with Triphone GMM- HMM acoustic model. The optimal architecture combination we got is deep LSTMP with L2 regularization.

查看原文本刊更多论文

基于长短期记忆的印尼语自发语音识别语言模型

在实际应用中部署语音识别器时，需要在日常或自发对话中具有强大的识别性能。与此同时，印尼语语音识别系统(ASR)与听写语音相比仍然表现不佳。在这项工作中，我们使用深度神经网络方法，主要关注使用长短期记忆(LSTM)来提高语言模型的性能，因为它已经成功地应用于许多长期上下文相关的问题，包括语言建模。我们尝试了不同的架构和参数来获得最优组合，包括深度LSTM和带投影层的LSTM (LSTMP)。随后，采用不同类型的语料库，从语言学上丰富语言模型。与n-gram作为基准相比，我们所有的LSTM语言模型在困惑度和单词错误率(%WER)方面都取得了显着改善。用Triphone GMM- HMM声学模型评价，混淆度改善达50.6%，最大的WER降低为3.61%。我们得到的最优结构组合是带L2正则化的深度LSTMP。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 International Conference on Computer, Control, Informatics and its Applications (IC3INA)

自引率

0.00%

发文量