{"title":"Performance Analysis of Implemented MFCC and HMM-based Speech Recognition System","authors":"Marlyn Maseri, Mazlina Mamat","doi":"10.1109/IICAIET49801.2020.9257823","DOIUrl":null,"url":null,"abstract":"This paper describes the performance analysis of designed speech recognition system whereby the front end method uses MFCCs feature extraction algorithm and defined HMM recognition as the back end. The dataset includes 30 phonemes and 2200 utterances by different speakers. Each speech signal is sampled to 16kHz, 16-bit PCM, and in a mono channel format. The extracted feature of each signal consists of 39 feature vectors which are 12 Mel Cepstrum Coefficients, Log Energy, Delta (first-order derivative) coefficients, and Acceleration coefficients (second-order derivative). The Baum-Welch algorithm is applied for HMM training and the Viterbi algorithms for decoding. The overall system performance accuracy of this experiment is 95.00%.","PeriodicalId":300885,"journal":{"name":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Artificial Intelligence in Engineering and Technology (IICAIET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IICAIET49801.2020.9257823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper describes the performance analysis of designed speech recognition system whereby the front end method uses MFCCs feature extraction algorithm and defined HMM recognition as the back end. The dataset includes 30 phonemes and 2200 utterances by different speakers. Each speech signal is sampled to 16kHz, 16-bit PCM, and in a mono channel format. The extracted feature of each signal consists of 39 feature vectors which are 12 Mel Cepstrum Coefficients, Log Energy, Delta (first-order derivative) coefficients, and Acceleration coefficients (second-order derivative). The Baum-Welch algorithm is applied for HMM training and the Viterbi algorithms for decoding. The overall system performance accuracy of this experiment is 95.00%.