{"title":"Temporal envelope subtraction for robust speech recognition using modulation spectrum","authors":"Sriram Ganapathy, Samuel Thomas, H. Hermansky","doi":"10.1109/ASRU.2009.5372922","DOIUrl":null,"url":null,"abstract":"In this paper, we present a new noise compensation technique for modulation frequency features derived from syllable length segments of subband temporal envelopes. The subband temporal envelopes are estimated using frequency domain linear prediction (FDLP). We propose a technique for noise compensation in FDLP where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensated FDLP envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are transformed into modulation spectral features. Experiments are performed on a phoneme recognition task as well as a connected digit recognition task where the test data is corrupted with variety of noise types at different signal to noise ratios. In these experiments with mismatched train and test conditions, the proposed features provide considerable improvements compared to other state of the art noise robust feature extraction techniques (average relative improvement of 25 % and 35 % over the baseline PLP features for phoneme and word recognition tasks respectively).","PeriodicalId":292194,"journal":{"name":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Workshop on Automatic Speech Recognition & Understanding","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2009.5372922","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
In this paper, we present a new noise compensation technique for modulation frequency features derived from syllable length segments of subband temporal envelopes. The subband temporal envelopes are estimated using frequency domain linear prediction (FDLP). We propose a technique for noise compensation in FDLP where an estimate of the noise envelope is subtracted from the noisy speech envelope. The noise compensated FDLP envelopes are compressed with static (logarithmic) and dynamic (adaptive loops) compression and are transformed into modulation spectral features. Experiments are performed on a phoneme recognition task as well as a connected digit recognition task where the test data is corrupted with variety of noise types at different signal to noise ratios. In these experiments with mismatched train and test conditions, the proposed features provide considerable improvements compared to other state of the art noise robust feature extraction techniques (average relative improvement of 25 % and 35 % over the baseline PLP features for phoneme and word recognition tasks respectively).