鲁棒语音识别的改进倒频谱最小均方误差降噪算法

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2017-03-07 DOI:10.1109/ICASSP.2017.7953081

Jinyu Li, Yan Huang, Y. Gong

{"title":"鲁棒语音识别的改进倒频谱最小均方误差降噪算法","authors":"Jinyu Li, Yan Huang, Y. Gong","doi":"10.1109/ICASSP.2017.7953081","DOIUrl":null,"url":null,"abstract":"In the era of deep learning, although beam-forming multi-channel signal processing is still very helpful, it was reported that single-channel robust front-ends usually cannot benefit deep learning models because the layer-by-layer structure of deep learning models provides a feature extraction strategy that automatically derives powerful noise-resistant features from primitive raw data for senone classification. In this study, we show that the single-channel robust front-end is still very beneficial to deep learning modelling as long as it is well designed. We improve a robust front-end, cepstra minimum mean square error (CMMSE), by using more reliable voice activity detector, refined prior SNR estimation, better gain smoothing and two-stage processing. This new front-end, improved CMMSE (ICMMSE), is evaluated on the standard Aurora 2 and Chime 3 tasks, and a 3400 hour Microsoft Cortana digital assistant task using Gaussian mixture models, feed-forward deep neural networks, and long short-term memory recurrent neural networks, respectively. It is shown that ICMMSE is superior regardless of the underlying acoustic models and the scale of evaluation tasks, with 25.46% relative WER reduction on Aurora 2, up to 11.98% relative WER reduction on Chime 3, and up to 11.01% relative WER reduction on Cortana digital assistant task, respectively.","PeriodicalId":118243,"journal":{"name":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition\",\"authors\":\"Jinyu Li, Yan Huang, Y. Gong\",\"doi\":\"10.1109/ICASSP.2017.7953081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of deep learning, although beam-forming multi-channel signal processing is still very helpful, it was reported that single-channel robust front-ends usually cannot benefit deep learning models because the layer-by-layer structure of deep learning models provides a feature extraction strategy that automatically derives powerful noise-resistant features from primitive raw data for senone classification. In this study, we show that the single-channel robust front-end is still very beneficial to deep learning modelling as long as it is well designed. We improve a robust front-end, cepstra minimum mean square error (CMMSE), by using more reliable voice activity detector, refined prior SNR estimation, better gain smoothing and two-stage processing. This new front-end, improved CMMSE (ICMMSE), is evaluated on the standard Aurora 2 and Chime 3 tasks, and a 3400 hour Microsoft Cortana digital assistant task using Gaussian mixture models, feed-forward deep neural networks, and long short-term memory recurrent neural networks, respectively. It is shown that ICMMSE is superior regardless of the underlying acoustic models and the scale of evaluation tasks, with 25.46% relative WER reduction on Aurora 2, up to 11.98% relative WER reduction on Chime 3, and up to 11.01% relative WER reduction on Cortana digital assistant task, respectively.\",\"PeriodicalId\":118243,\"journal\":{\"name\":\"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-03-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICASSP.2017.7953081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASSP.2017.7953081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

在这项研究中，我们表明，只要设计得当，单通道鲁棒前端仍然非常有利于深度学习建模。通过使用更可靠的语音活动检测器、改进的先验信噪比估计、更好的增益平滑和两阶段处理，我们改进了鲁棒的前端倒频谱最小均方误差(CMMSE)。采用高斯混合模型、前馈深度神经网络和长短期记忆递归神经网络，分别在标准的Aurora 2和Chime 3任务以及3400小时的Microsoft Cortana数字助理任务上对这种新的前端改进的CMMSE (ICMMSE)进行了评估。结果表明，无论潜在声学模型和评估任务的规模如何，ICMMSE都具有优越的效果，在Aurora 2上相对降低了25.46%的相对降低了11.98%，在Chime 3上相对降低了11.01%，在Cortana数字助理任务上相对降低了11.01%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Improved cepstra minimum-mean-square-error noise reduction algorithm for robust speech recognition

In the era of deep learning, although beam-forming multi-channel signal processing is still very helpful, it was reported that single-channel robust front-ends usually cannot benefit deep learning models because the layer-by-layer structure of deep learning models provides a feature extraction strategy that automatically derives powerful noise-resistant features from primitive raw data for senone classification. In this study, we show that the single-channel robust front-end is still very beneficial to deep learning modelling as long as it is well designed. We improve a robust front-end, cepstra minimum mean square error (CMMSE), by using more reliable voice activity detector, refined prior SNR estimation, better gain smoothing and two-stage processing. This new front-end, improved CMMSE (ICMMSE), is evaluated on the standard Aurora 2 and Chime 3 tasks, and a 3400 hour Microsoft Cortana digital assistant task using Gaussian mixture models, feed-forward deep neural networks, and long short-term memory recurrent neural networks, respectively. It is shown that ICMMSE is superior regardless of the underlying acoustic models and the scale of evaluation tasks, with 25.46% relative WER reduction on Aurora 2, up to 11.98% relative WER reduction on Chime 3, and up to 11.01% relative WER reduction on Cortana digital assistant task, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量