Hay Mar Soe Naing, Risanuri Hidayat, Rudy Hartanto, Y. Miyanaga
{"title":"Using Double-Density Dual Tree Wavelet Transform into MFCC for Noisy Speech Recognition","authors":"Hay Mar Soe Naing, Risanuri Hidayat, Rudy Hartanto, Y. Miyanaga","doi":"10.1109/ICITEE49829.2020.9271737","DOIUrl":null,"url":null,"abstract":"The automatic speech recognition has gained significant progress in technology as well as in many applications. However, speech fluctuations due to noise effects significantly reduce recognition accuracy, and recognition on noisy channels is more difficult to generate correct word sequences than in a clean environment. Extracting meaningful acoustic information from noisy speech utterances has been a challenging task recently. Therefore, we present a combination of Mel frequency cepstrum coefficient (MFCC) and double-density dual tree wavelet transformation denoising algorithm to recognize noisy speech utterances. Hybrid frame-level cross entropy deep neural network-hidden Markov model (DNN-HMM) is used as an acoustic modeling activity. According to a suite of experiments, the proposed denoising method provides better performance without affecting the accuracy of higher sound intensity levels. Experimental results demonstrate that the recognition accuracy reach up to 96.6% in 10dB, 91.84% in 5dB, 78.05% in 0dB and 49.37% in -5dB, respectively.","PeriodicalId":245013,"journal":{"name":"2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 12th International Conference on Information Technology and Electrical Engineering (ICITEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITEE49829.2020.9271737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The automatic speech recognition has gained significant progress in technology as well as in many applications. However, speech fluctuations due to noise effects significantly reduce recognition accuracy, and recognition on noisy channels is more difficult to generate correct word sequences than in a clean environment. Extracting meaningful acoustic information from noisy speech utterances has been a challenging task recently. Therefore, we present a combination of Mel frequency cepstrum coefficient (MFCC) and double-density dual tree wavelet transformation denoising algorithm to recognize noisy speech utterances. Hybrid frame-level cross entropy deep neural network-hidden Markov model (DNN-HMM) is used as an acoustic modeling activity. According to a suite of experiments, the proposed denoising method provides better performance without affecting the accuracy of higher sound intensity levels. Experimental results demonstrate that the recognition accuracy reach up to 96.6% in 10dB, 91.84% in 5dB, 78.05% in 0dB and 49.37% in -5dB, respectively.