Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman
{"title":"Towards Lexicon-free Bangla Automatic Speech Recognition System","authors":"Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman","doi":"10.1109/ICBSLP47725.2019.201544","DOIUrl":null,"url":null,"abstract":"This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.