Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System

2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE) Pub Date : 2018-07-01 DOI:10.1109/ICITEED.2018.8534807

Risanuri Hidayat, Agus Bejo, Sujoko Sumaryono, A. Winursito

{"title":"Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System","authors":"Risanuri Hidayat, Agus Bejo, Sujoko Sumaryono, A. Winursito","doi":"10.1109/ICITEED.2018.8534807","DOIUrl":null,"url":null,"abstract":"Mel frequency cepstral coefficient (MFCC) is a popular feature extraction method for a speech recognition system. However, this method is susceptible to noise even though it generates a high accuracy. The conventional MFCC method has a degraded performance when the input signal has noises. This paper presents the implementation of denoising wavelet on speech input of MFCC feature extraction method. The addition of denoising process using wavelet transformation was expected to improve the MFCC performance on noisy signals. The study used 120 speech data, with 30 data were used as the reference, and the other 90 were used as the testing data. The testing data were mixed with white Gaussian noise and then tested to the speech recognition system that already had the reference data. Parameters used in the wavelet denoising process were soft thresholding with the Minimaxi thresholding rule. Eleven wavelet methods on decomposition level 10 were tested on the denoising process. The classification process used K-nearest neighbor (KNN) method. The Fejer-Korovkin 6 wavelet was the best denoising speech signal method that achieved the highest accuracy on input signals with SNR of 5-15dB. Meanwhile, the Daubechies 5 method had a high accuracy on input signal with SNR of 3 dB. All of the tested denoising methods using wavelet transformation were able to improve the accuracy of the speech recognition system on input signals with SNR of 0-10 dB compared to the system without denoising method.","PeriodicalId":142523,"journal":{"name":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2018.8534807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 32

Abstract

Mel frequency cepstral coefficient (MFCC) is a popular feature extraction method for a speech recognition system. However, this method is susceptible to noise even though it generates a high accuracy. The conventional MFCC method has a degraded performance when the input signal has noises. This paper presents the implementation of denoising wavelet on speech input of MFCC feature extraction method. The addition of denoising process using wavelet transformation was expected to improve the MFCC performance on noisy signals. The study used 120 speech data, with 30 data were used as the reference, and the other 90 were used as the testing data. The testing data were mixed with white Gaussian noise and then tested to the speech recognition system that already had the reference data. Parameters used in the wavelet denoising process were soft thresholding with the Minimaxi thresholding rule. Eleven wavelet methods on decomposition level 10 were tested on the denoising process. The classification process used K-nearest neighbor (KNN) method. The Fejer-Korovkin 6 wavelet was the best denoising speech signal method that achieved the highest accuracy on input signals with SNR of 5-15dB. Meanwhile, the Daubechies 5 method had a high accuracy on input signal with SNR of 3 dB. All of the tested denoising methods using wavelet transformation were able to improve the accuracy of the speech recognition system on input signals with SNR of 0-10 dB compared to the system without denoising method.

查看原文本刊更多论文

语音识别系统中基于小波变换的MFCC特征提取去噪

低频倒谱系数(MFCC)是语音识别系统中常用的特征提取方法。然而，这种方法即使产生较高的精度，也容易受到噪声的影响。当输入信号中存在噪声时，传统的MFCC方法性能下降。本文介绍了MFCC特征提取方法对语音输入进行小波去噪的实现。利用小波变换加入去噪处理可以提高MFCC对噪声信号的处理性能。本研究使用120个语音数据，其中30个数据作为参考数据，另外90个数据作为测试数据。将测试数据与高斯白噪声混合，然后对已有参考数据的语音识别系统进行测试。小波去噪过程中使用的参数是基于minimi阈值规则的软阈值。对10级分解的11种小波方法进行了去噪试验。分类过程采用k -最近邻(KNN)方法。Fejer-Korovkin 6小波是降噪效果最好的语音信号方法，对输入信号的降噪精度最高，信噪比为5 ~ 15db。同时，Daubechies 5方法对输入信号具有较高的精度，信噪比为3 dB。所测试的采用小波变换的去噪方法均能提高语音识别系统对输入信号的准确率，信噪比在0 ~ 10 dB之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)

自引率

0.00%

发文量