{"title":"Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN","authors":"Muhammad Atif Imtiaz, G. Raja","doi":"10.1109/APMEDIACAST.2016.7878163","DOIUrl":null,"url":null,"abstract":"Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech signals; features of speech are extracted using MFCC's. DTW is applied for speech feature matching. KNN is employed as a classifier. The experimental setup includes words of English language collected from five speakers. These words were spoken in an acoustically balanced, noise free environment. The experimental results of proposed ASR system are obtained in the form of matrix called confusion matrix. The recognition accuracy achieved in this research is 98.4 %.","PeriodicalId":177765,"journal":{"name":"2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APMEDIACAST.2016.7878163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24
Abstract
Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech signals; features of speech are extracted using MFCC's. DTW is applied for speech feature matching. KNN is employed as a classifier. The experimental setup includes words of English language collected from five speakers. These words were spoken in an acoustically balanced, noise free environment. The experimental results of proposed ASR system are obtained in the form of matrix called confusion matrix. The recognition accuracy achieved in this research is 98.4 %.