Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen
{"title":"Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition","authors":"Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen","doi":"10.1109/ICME.2017.8019509","DOIUrl":null,"url":null,"abstract":"Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2017.8019509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.