基于MFCC、DTW和KNN的孤立词自动语音识别系统

2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast) Pub Date : 2016-11-01 DOI:10.1109/APMEDIACAST.2016.7878163

Muhammad Atif Imtiaz, G. Raja

{"title":"基于MFCC、DTW和KNN的孤立词自动语音识别系统","authors":"Muhammad Atif Imtiaz, G. Raja","doi":"10.1109/APMEDIACAST.2016.7878163","DOIUrl":null,"url":null,"abstract":"Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech signals; features of speech are extracted using MFCC's. DTW is applied for speech feature matching. KNN is employed as a classifier. The experimental setup includes words of English language collected from five speakers. These words were spoken in an acoustically balanced, noise free environment. The experimental results of proposed ASR system are obtained in the form of matrix called confusion matrix. The recognition accuracy achieved in this research is 98.4 %.","PeriodicalId":177765,"journal":{"name":"2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN\",\"authors\":\"Muhammad Atif Imtiaz, G. Raja\",\"doi\":\"10.1109/APMEDIACAST.2016.7878163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech signals; features of speech are extracted using MFCC's. DTW is applied for speech feature matching. KNN is employed as a classifier. The experimental setup includes words of English language collected from five speakers. These words were spoken in an acoustically balanced, noise free environment. The experimental results of proposed ASR system are obtained in the form of matrix called confusion matrix. The recognition accuracy achieved in this research is 98.4 %.\",\"PeriodicalId\":177765,\"journal\":{\"name\":\"2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APMEDIACAST.2016.7878163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APMEDIACAST.2016.7878163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

自动语音识别(ASR)系统的定义是将声学语音信号转换为字串。本文利用Mel-Frequency倒谱系数(MFCC’s)、动态时间包裹(DTW)和k -最近邻(KNN)技术，提出了一种基于孤立词结构的ASR系统。Mel-Frequency尺度用于捕捉语音信号的显著特征;使用MFCC提取语音特征。采用DTW进行语音特征匹配。使用KNN作为分类器。实验设置包括从五位说话者那里收集的英语单词。这些话是在一个声学平衡、无噪音的环境中说的。该系统的实验结果以混淆矩阵的形式给出。本研究的识别准确率为98.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN

Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech signals; features of speech are extracted using MFCC's. DTW is applied for speech feature matching. KNN is employed as a classifier. The experimental setup includes words of English language collected from five speakers. These words were spoken in an acoustically balanced, noise free environment. The experimental results of proposed ASR system are obtained in the form of matrix called confusion matrix. The recognition accuracy achieved in this research is 98.4 %.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)

自引率

0.00%

发文量