Towards Lexicon-free Bangla Automatic Speech Recognition System

2019 International Conference on Bangla Speech and Language Processing (ICBSLP) Pub Date : 2019-09-01 DOI:10.1109/ICBSLP47725.2019.201544

Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman

{"title":"Towards Lexicon-free Bangla Automatic Speech Recognition System","authors":"Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman","doi":"10.1109/ICBSLP47725.2019.201544","DOIUrl":null,"url":null,"abstract":"This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.

查看原文本刊更多论文

无词典孟加拉语自动语音识别系统

本文提出了一个无词典的孟加拉语自动语音识别(ASR)系统，并研究了一个开源的大型孟加拉语自动语音识别语料库，该语料库经过OpenSLR的验证。该模型使用改进的MFCC声学特征进行训练，并使用深度LSTM作为声学模型。在ASR的解码或最后部分，我们尝试了两种类型的解码技术;一种是使用连接时间分类(CTC)和统计语言模型(LM)联合解码器进行波束译码，另一种是基于连接时间分类的贪婪译码。我们已经训练并研究了非增强语音作为输入的ASR的性能。与过去使用端到端方法进行孟加拉国ASR的研究结果相比，取得的结果是突出的。在测试数据集上，我们的端到端系统使用两个不同的解码器获得了不同的结果。使用贪婪解码器和束流解码器分别获得了39.61%和18.50%的译码率和27.89%和12.31%的译码率，两者的译码率略有提高。这一成就是孟加拉国持续ASR的最新水平。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Bangla Speech and Language Processing (ICBSLP)

自引率

0.00%

发文量