{"title":"Normalization on the modulation spectrum of the subband temporal envelopes for automatic speech recognition in reverberant environments","authors":"Xugang Lu, M. Unoki, Satoshi Nakamura","doi":"10.1145/1667780.1667832","DOIUrl":null,"url":null,"abstract":"In this study, we proposed a feature extraction method based on the subband temporal envelopes (STEs) and their normalization for reverberated speech recognition. The STEs were extracted by using a series of constant bandwidth band-pass filters with Hilbert transform followed by a low-pass filtering. In the normalization, both the modulation spectrum (MS) of the subband temporal envelopes of the clean and reverberated speech are normalized to a reference MS calculated from a clean speech data set. Based on the normalized subband MS, the inverse Fourier transform was used to restore the subband temporal envelopes. We tested the proposed method on speech recognition in a reverberant room with different speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel-cepstral coefficients with mean and variance normalization were used as the baseline. Experimental results showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96% relative improvement by only using subband temporal envelope processing, and further a 15.68% relative improvement by using the normalization on the subband modulation spectrum. Totally, there was about a 53.59% relative improvement, which was better than those of using other temporal filtering and normalization methods.","PeriodicalId":103128,"journal":{"name":"Proceedings of the 3rd International Universal Communication Symposium","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Universal Communication Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1667780.1667832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
In this study, we proposed a feature extraction method based on the subband temporal envelopes (STEs) and their normalization for reverberated speech recognition. The STEs were extracted by using a series of constant bandwidth band-pass filters with Hilbert transform followed by a low-pass filtering. In the normalization, both the modulation spectrum (MS) of the subband temporal envelopes of the clean and reverberated speech are normalized to a reference MS calculated from a clean speech data set. Based on the normalized subband MS, the inverse Fourier transform was used to restore the subband temporal envelopes. We tested the proposed method on speech recognition in a reverberant room with different speaker to microphone distance (SMD). For comparison, the recognition performance of using the traditional Mel-cepstral coefficients with mean and variance normalization were used as the baseline. Experimental results showed that, by averaging the SMDs from 50 cm to 400 cm, there was a 44.96% relative improvement by only using subband temporal envelope processing, and further a 15.68% relative improvement by using the normalization on the subband modulation spectrum. Totally, there was about a 53.59% relative improvement, which was better than those of using other temporal filtering and normalization methods.