{"title":"鲁棒最优子带小波倒谱系数语音识别方法","authors":"J. Alex, N. Venkatesan","doi":"10.1504/IJCAET.2019.098137","DOIUrl":null,"url":null,"abstract":"The objective of this paper is to propose a robust feature extraction technique for speech recognition system which is insusceptible in the adverse environments. Efficacy of the speech recognition system depends on the feature extraction method. This paper proposes an auditory scale like filter banks using optimal sub-band tree structuring based on wavelet transform. The optimised wavelet filter banks along with energy, logarithmic, discrete cosine transform and cepstral mean normalisation blocks form a robust feature extraction method. This method is validated on a hidden Markov model (HMM)-based single Gaussian isolated word recognition system for additive white Gaussian noise, street and airport noises with different noise levels. Compared with Fourier transform-based methods such as mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive (PLP) methods, the wavelet transform-based method yielded significant improvement across all the noise levels. The experiments also performed with higher dimensions of MFCC features including delta, acceleration features (MFCC_D_A). This study proves that the outcome of wavelet transform-based method gives an increased recognition accuracy of 13% over MFCC_D_A for non-stationary noises.","PeriodicalId":346646,"journal":{"name":"Int. J. Comput. Aided Eng. Technol.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Robust optimal sub-band wavelet cepstral coefficient method for speech recognition\",\"authors\":\"J. Alex, N. Venkatesan\",\"doi\":\"10.1504/IJCAET.2019.098137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The objective of this paper is to propose a robust feature extraction technique for speech recognition system which is insusceptible in the adverse environments. Efficacy of the speech recognition system depends on the feature extraction method. This paper proposes an auditory scale like filter banks using optimal sub-band tree structuring based on wavelet transform. The optimised wavelet filter banks along with energy, logarithmic, discrete cosine transform and cepstral mean normalisation blocks form a robust feature extraction method. This method is validated on a hidden Markov model (HMM)-based single Gaussian isolated word recognition system for additive white Gaussian noise, street and airport noises with different noise levels. Compared with Fourier transform-based methods such as mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive (PLP) methods, the wavelet transform-based method yielded significant improvement across all the noise levels. The experiments also performed with higher dimensions of MFCC features including delta, acceleration features (MFCC_D_A). This study proves that the outcome of wavelet transform-based method gives an increased recognition accuracy of 13% over MFCC_D_A for non-stationary noises.\",\"PeriodicalId\":346646,\"journal\":{\"name\":\"Int. J. Comput. Aided Eng. Technol.\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Aided Eng. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1504/IJCAET.2019.098137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Aided Eng. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/IJCAET.2019.098137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust optimal sub-band wavelet cepstral coefficient method for speech recognition
The objective of this paper is to propose a robust feature extraction technique for speech recognition system which is insusceptible in the adverse environments. Efficacy of the speech recognition system depends on the feature extraction method. This paper proposes an auditory scale like filter banks using optimal sub-band tree structuring based on wavelet transform. The optimised wavelet filter banks along with energy, logarithmic, discrete cosine transform and cepstral mean normalisation blocks form a robust feature extraction method. This method is validated on a hidden Markov model (HMM)-based single Gaussian isolated word recognition system for additive white Gaussian noise, street and airport noises with different noise levels. Compared with Fourier transform-based methods such as mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive (PLP) methods, the wavelet transform-based method yielded significant improvement across all the noise levels. The experiments also performed with higher dimensions of MFCC features including delta, acceleration features (MFCC_D_A). This study proves that the outcome of wavelet transform-based method gives an increased recognition accuracy of 13% over MFCC_D_A for non-stationary noises.