S. Sharan, A. Dev, Poonam Bansal, Shweta A. Bansal, S. Agrawal
{"title":"Sphinx-Based Evaluation of Efficient Acoustic Modeling Parameters for LibriSpeech Corpus","authors":"S. Sharan, A. Dev, Poonam Bansal, Shweta A. Bansal, S. Agrawal","doi":"10.1109/AIST55798.2022.10064750","DOIUrl":null,"url":null,"abstract":"In this paper we are assessing the efficient parameters i.e., the number of senones and number of gaussian densities for a well-known audiobook corpus \"LibriSpeech\" based Automatic Speech Recognition System (ASR) using the open-source tool Sphinx. Sphinx is a Hidden Markov Model (HMM) based offline large vocabulary language and speaker independent continuous ASR system with a support for low-resource handheld devices. We have trained the acoustic model by varying the parameters and examined the quality of the models using Word Error Rate (WER). The best achieved WER of the model is observed as 9.5% with 2000 senones and 64 gaussian distributions.","PeriodicalId":360351,"journal":{"name":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Artificial Intelligence and Speech Technology (AIST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIST55798.2022.10064750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we are assessing the efficient parameters i.e., the number of senones and number of gaussian densities for a well-known audiobook corpus "LibriSpeech" based Automatic Speech Recognition System (ASR) using the open-source tool Sphinx. Sphinx is a Hidden Markov Model (HMM) based offline large vocabulary language and speaker independent continuous ASR system with a support for low-resource handheld devices. We have trained the acoustic model by varying the parameters and examined the quality of the models using Word Error Rate (WER). The best achieved WER of the model is observed as 9.5% with 2000 senones and 64 gaussian distributions.