{"title":"A novel feature extractor employing regularized MVDR spectrum estimator and subband spectrum enhancement technique","authors":"Md. Jahangir Alam, D. O'Shaughnessy, P. Kenny","doi":"10.1109/WOSSPA.2013.6602388","DOIUrl":null,"url":null,"abstract":"This paper presents a novel feature extractor for robust large vocabulary continuous speech recognition (LVCSR) task. For accurate and robust estimation of speech power spectrum we propose to compute the features from the regularized minimum variance distortionless response (regMVDR) spectral estimate instead of the windowed periodogram estimate. A sigmoid shape subband spectrum enhancement technique and a short-time feature normalization, known as short-time mean and scale normalization (STMSN), are also used for robust estimation of the cepstral features for speech recognition task. When evaluated on the AURORA-4 LVCSR corpus proposed feature extractor provides an average relative improvement of 38.5%,35.0%, and 34.3%,30.7%,5.6%, and 7.1% over the MFCC, PLP, MVDR-based MFCC, regMVDR-based MFCC, PNCC and the robust feature extractor of [4], respectively, in terms of the recognition accuracy.","PeriodicalId":417940,"journal":{"name":"2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 8th International Workshop on Systems, Signal Processing and their Applications (WoSSPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WOSSPA.2013.6602388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
This paper presents a novel feature extractor for robust large vocabulary continuous speech recognition (LVCSR) task. For accurate and robust estimation of speech power spectrum we propose to compute the features from the regularized minimum variance distortionless response (regMVDR) spectral estimate instead of the windowed periodogram estimate. A sigmoid shape subband spectrum enhancement technique and a short-time feature normalization, known as short-time mean and scale normalization (STMSN), are also used for robust estimation of the cepstral features for speech recognition task. When evaluated on the AURORA-4 LVCSR corpus proposed feature extractor provides an average relative improvement of 38.5%,35.0%, and 34.3%,30.7%,5.6%, and 7.1% over the MFCC, PLP, MVDR-based MFCC, regMVDR-based MFCC, PNCC and the robust feature extractor of [4], respectively, in terms of the recognition accuracy.