Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman
{"title":"无词典孟加拉语自动语音识别系统","authors":"Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman","doi":"10.1109/ICBSLP47725.2019.201544","DOIUrl":null,"url":null,"abstract":"This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Towards Lexicon-free Bangla Automatic Speech Recognition System\",\"authors\":\"Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman\",\"doi\":\"10.1109/ICBSLP47725.2019.201544\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.\",\"PeriodicalId\":413077,\"journal\":{\"name\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"volume\":\"135 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBSLP47725.2019.201544\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201544","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Lexicon-free Bangla Automatic Speech Recognition System
This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.