{"title":"Robust speech recognition under noisy environments using asymmetric tapers","authors":"Md. Jahangir Alam, P. Kenny, D. O'Shaughnessy","doi":"10.5281/ZENODO.43036","DOIUrl":null,"url":null,"abstract":"This paper presents asymmetric taper (or window)-based robust Mel frequency cepstral coefficient (MFCC) feature extraction for automatic speech recognition (ASR). Commonly, MFCC features are computed from a symmetric Hamming-tapered direct-spectrum estimate. Symmetric tapers have linear phase and also imply longer time delay. In ASR systems, phase information is usually discarded as human speech perception is relatively insensitive to short-time phase distortion. So, any linearity constraint on phase can be removed without adverse effects. Use of asymmetric tapers, having better frequency response and shorter time delay, for MFCC feature extraction in speech recognition can lead to better recognition performance. Using our proposed method it is possible to introduce asymmetry in any symmetric taper by adjusting only one additional parameter, which controls the degree of asymmetry. Experimental results on the AURORA-2 corpus show that the proposed asymmetric tapers outperform the symmetric Hamming taper in terms of word accuracy both in clean and noisy environments.","PeriodicalId":201182,"journal":{"name":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.43036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
This paper presents asymmetric taper (or window)-based robust Mel frequency cepstral coefficient (MFCC) feature extraction for automatic speech recognition (ASR). Commonly, MFCC features are computed from a symmetric Hamming-tapered direct-spectrum estimate. Symmetric tapers have linear phase and also imply longer time delay. In ASR systems, phase information is usually discarded as human speech perception is relatively insensitive to short-time phase distortion. So, any linearity constraint on phase can be removed without adverse effects. Use of asymmetric tapers, having better frequency response and shorter time delay, for MFCC feature extraction in speech recognition can lead to better recognition performance. Using our proposed method it is possible to introduce asymmetry in any symmetric taper by adjusting only one additional parameter, which controls the degree of asymmetry. Experimental results on the AURORA-2 corpus show that the proposed asymmetric tapers outperform the symmetric Hamming taper in terms of word accuracy both in clean and noisy environments.