Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition

2017 IEEE International Conference on Multimedia and Expo (ICME) Pub Date : 2017-08-28 DOI:10.1109/ICME.2017.8019509

Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen

{"title":"Enhancing feature modulation spectra with dictionary learning approaches for robust speech recognition","authors":"Bi-Cheng Yan, Chin-Hong Shih, Shih-Hung Liu, Berlin Chen","doi":"10.1109/ICME.2017.8019509","DOIUrl":null,"url":null,"abstract":"Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.","PeriodicalId":330977,"journal":{"name":"2017 IEEE International Conference on Multimedia and Expo (ICME)","volume":"412 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Multimedia and Expo (ICME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICME.2017.8019509","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Noise robustness has long garnered much interest from researchers and practitioners of the automatic speech recognition (ASR) community due to its paramount importance to the success of ASR systems. This paper presents a novel approach to improving the noise robustness of speech features, building on top of the dictionary learning paradigm. To this end, we employ the K-SVD method and its variants to create sparse representations with respect to a common set of basis spectral vectors that captures the intrinsic temporal structure inherent in the modulation spectra of clean training speech features. The enhanced modulation spectra of speech features, constructed by mapping the original modulation spectra into the space spanned by these representative basis vectors, can better carry noise-resistant acoustic characteristics. In addition, considering the nonnegative property of the modulation spectrum amplitudes, we utilize the nonnegative K-SVD method, in combination with the nonnegative sparse coding method, to generate more noise-robust speech features. All experiments were conducted and verified using the standard Aurora-2 database and task. The empirical results show that the proposed dictionary learning based approach can provide significant average word error reductions when being integrated with either a GMM-HMM or a DNN-HMM based ASR system.

查看原文本刊更多论文

用字典学习方法增强特征调制谱用于鲁棒语音识别

噪声鲁棒性对自动语音识别(ASR)系统的成功至关重要，因此长期以来一直受到自动语音识别(ASR)社区研究人员和实践者的关注。本文在字典学习范式的基础上提出了一种提高语音特征噪声鲁棒性的新方法。为此，我们采用K-SVD方法及其变体来创建相对于一组公共基谱向量的稀疏表示，这些基谱向量捕获了干净训练语音特征调制谱中固有的固有时间结构。通过将原始调制谱映射到这些代表性基向量所跨越的空间中，构建语音特征的增强调制谱，可以更好地承载抗噪声声学特性。此外，考虑到调制频谱幅值的非负特性，我们利用非负K-SVD方法，结合非负稀疏编码方法，生成更强的噪声鲁棒性语音特征。所有实验都是使用标准的Aurora-2数据库和任务进行和验证的。实证结果表明，本文提出的基于字典学习的方法在与基于GMM-HMM或基于DNN-HMM的ASR系统集成时，可以显著降低平均单词误差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE International Conference on Multimedia and Expo (ICME)

自引率

0.00%

发文量