Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System

Risanuri Hidayat, Agus Bejo, Sujoko Sumaryono, A. Winursito
{"title":"Denoising Speech for MFCC Feature Extraction Using Wavelet Transformation in Speech Recognition System","authors":"Risanuri Hidayat, Agus Bejo, Sujoko Sumaryono, A. Winursito","doi":"10.1109/ICITEED.2018.8534807","DOIUrl":null,"url":null,"abstract":"Mel frequency cepstral coefficient (MFCC) is a popular feature extraction method for a speech recognition system. However, this method is susceptible to noise even though it generates a high accuracy. The conventional MFCC method has a degraded performance when the input signal has noises. This paper presents the implementation of denoising wavelet on speech input of MFCC feature extraction method. The addition of denoising process using wavelet transformation was expected to improve the MFCC performance on noisy signals. The study used 120 speech data, with 30 data were used as the reference, and the other 90 were used as the testing data. The testing data were mixed with white Gaussian noise and then tested to the speech recognition system that already had the reference data. Parameters used in the wavelet denoising process were soft thresholding with the Minimaxi thresholding rule. Eleven wavelet methods on decomposition level 10 were tested on the denoising process. The classification process used K-nearest neighbor (KNN) method. The Fejer-Korovkin 6 wavelet was the best denoising speech signal method that achieved the highest accuracy on input signals with SNR of 5-15dB. Meanwhile, the Daubechies 5 method had a high accuracy on input signal with SNR of 3 dB. All of the tested denoising methods using wavelet transformation were able to improve the accuracy of the speech recognition system on input signals with SNR of 0-10 dB compared to the system without denoising method.","PeriodicalId":142523,"journal":{"name":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEED.2018.8534807","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32

Abstract

Mel frequency cepstral coefficient (MFCC) is a popular feature extraction method for a speech recognition system. However, this method is susceptible to noise even though it generates a high accuracy. The conventional MFCC method has a degraded performance when the input signal has noises. This paper presents the implementation of denoising wavelet on speech input of MFCC feature extraction method. The addition of denoising process using wavelet transformation was expected to improve the MFCC performance on noisy signals. The study used 120 speech data, with 30 data were used as the reference, and the other 90 were used as the testing data. The testing data were mixed with white Gaussian noise and then tested to the speech recognition system that already had the reference data. Parameters used in the wavelet denoising process were soft thresholding with the Minimaxi thresholding rule. Eleven wavelet methods on decomposition level 10 were tested on the denoising process. The classification process used K-nearest neighbor (KNN) method. The Fejer-Korovkin 6 wavelet was the best denoising speech signal method that achieved the highest accuracy on input signals with SNR of 5-15dB. Meanwhile, the Daubechies 5 method had a high accuracy on input signal with SNR of 3 dB. All of the tested denoising methods using wavelet transformation were able to improve the accuracy of the speech recognition system on input signals with SNR of 0-10 dB compared to the system without denoising method.
语音识别系统中基于小波变换的MFCC特征提取去噪
低频倒谱系数(MFCC)是语音识别系统中常用的特征提取方法。然而,这种方法即使产生较高的精度,也容易受到噪声的影响。当输入信号中存在噪声时,传统的MFCC方法性能下降。本文介绍了MFCC特征提取方法对语音输入进行小波去噪的实现。利用小波变换加入去噪处理可以提高MFCC对噪声信号的处理性能。本研究使用120个语音数据,其中30个数据作为参考数据,另外90个数据作为测试数据。将测试数据与高斯白噪声混合,然后对已有参考数据的语音识别系统进行测试。小波去噪过程中使用的参数是基于minimi阈值规则的软阈值。对10级分解的11种小波方法进行了去噪试验。分类过程采用k -最近邻(KNN)方法。Fejer-Korovkin 6小波是降噪效果最好的语音信号方法,对输入信号的降噪精度最高,信噪比为5 ~ 15db。同时,Daubechies 5方法对输入信号具有较高的精度,信噪比为3 dB。所测试的采用小波变换的去噪方法均能提高语音识别系统对输入信号的准确率,信噪比在0 ~ 10 dB之间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信