{"title":"基于非对称锥度的噪声环境下的鲁棒语音识别","authors":"Md. Jahangir Alam, P. Kenny, D. O'Shaughnessy","doi":"10.5281/ZENODO.43036","DOIUrl":null,"url":null,"abstract":"This paper presents asymmetric taper (or window)-based robust Mel frequency cepstral coefficient (MFCC) feature extraction for automatic speech recognition (ASR). Commonly, MFCC features are computed from a symmetric Hamming-tapered direct-spectrum estimate. Symmetric tapers have linear phase and also imply longer time delay. In ASR systems, phase information is usually discarded as human speech perception is relatively insensitive to short-time phase distortion. So, any linearity constraint on phase can be removed without adverse effects. Use of asymmetric tapers, having better frequency response and shorter time delay, for MFCC feature extraction in speech recognition can lead to better recognition performance. Using our proposed method it is possible to introduce asymmetry in any symmetric taper by adjusting only one additional parameter, which controls the degree of asymmetry. Experimental results on the AURORA-2 corpus show that the proposed asymmetric tapers outperform the symmetric Hamming taper in terms of word accuracy both in clean and noisy environments.","PeriodicalId":201182,"journal":{"name":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Robust speech recognition under noisy environments using asymmetric tapers\",\"authors\":\"Md. Jahangir Alam, P. Kenny, D. O'Shaughnessy\",\"doi\":\"10.5281/ZENODO.43036\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents asymmetric taper (or window)-based robust Mel frequency cepstral coefficient (MFCC) feature extraction for automatic speech recognition (ASR). Commonly, MFCC features are computed from a symmetric Hamming-tapered direct-spectrum estimate. Symmetric tapers have linear phase and also imply longer time delay. In ASR systems, phase information is usually discarded as human speech perception is relatively insensitive to short-time phase distortion. So, any linearity constraint on phase can be removed without adverse effects. Use of asymmetric tapers, having better frequency response and shorter time delay, for MFCC feature extraction in speech recognition can lead to better recognition performance. Using our proposed method it is possible to introduce asymmetry in any symmetric taper by adjusting only one additional parameter, which controls the degree of asymmetry. Experimental results on the AURORA-2 corpus show that the proposed asymmetric tapers outperform the symmetric Hamming taper in terms of word accuracy both in clean and noisy environments.\",\"PeriodicalId\":201182,\"journal\":{\"name\":\"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5281/ZENODO.43036\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5281/ZENODO.43036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust speech recognition under noisy environments using asymmetric tapers
This paper presents asymmetric taper (or window)-based robust Mel frequency cepstral coefficient (MFCC) feature extraction for automatic speech recognition (ASR). Commonly, MFCC features are computed from a symmetric Hamming-tapered direct-spectrum estimate. Symmetric tapers have linear phase and also imply longer time delay. In ASR systems, phase information is usually discarded as human speech perception is relatively insensitive to short-time phase distortion. So, any linearity constraint on phase can be removed without adverse effects. Use of asymmetric tapers, having better frequency response and shorter time delay, for MFCC feature extraction in speech recognition can lead to better recognition performance. Using our proposed method it is possible to introduce asymmetry in any symmetric taper by adjusting only one additional parameter, which controls the degree of asymmetry. Experimental results on the AURORA-2 corpus show that the proposed asymmetric tapers outperform the symmetric Hamming taper in terms of word accuracy both in clean and noisy environments.