Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN

2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast) Pub Date : 2016-11-01 DOI:10.1109/APMEDIACAST.2016.7878163

Muhammad Atif Imtiaz, G. Raja

引用次数: 24

Abstract

Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech signals; features of speech are extracted using MFCC's. DTW is applied for speech feature matching. KNN is employed as a classifier. The experimental setup includes words of English language collected from five speakers. These words were spoken in an acoustically balanced, noise free environment. The experimental results of proposed ASR system are obtained in the form of matrix called confusion matrix. The recognition accuracy achieved in this research is 98.4 %.

查看原文本刊更多论文

基于MFCC、DTW和KNN的孤立词自动语音识别系统

自动语音识别(ASR)系统的定义是将声学语音信号转换为字串。本文利用Mel-Frequency倒谱系数(MFCC’s)、动态时间包裹(DTW)和k -最近邻(KNN)技术，提出了一种基于孤立词结构的ASR系统。Mel-Frequency尺度用于捕捉语音信号的显著特征;使用MFCC提取语音特征。采用DTW进行语音特征匹配。使用KNN作为分类器。实验设置包括从五位说话者那里收集的英语单词。这些话是在一个声学平衡、无噪音的环境中说的。该系统的实验结果以混淆矩阵的形式给出。本研究的识别准确率为98.4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)

自引率

0.00%

发文量