Multiresolution CNN for reverberant speech recognition

Sunchan Park, Yongwon Jeong, H. S. Kim
{"title":"Multiresolution CNN for reverberant speech recognition","authors":"Sunchan Park, Yongwon Jeong, H. S. Kim","doi":"10.1109/ICSDA.2017.8384470","DOIUrl":null,"url":null,"abstract":"The performance of automatic speech recognition (ASR) has been greatly improved by deep neural network (DNN) acoustic models. However, DNN-based systems still perform poorly in reverberant environments. Convolutional neural network (CNN) acoustic models showed lower word error rate (WER) in distant speech recognition than fully-connected DNN acoustic models. To improve the performance of reverberant speech recognition using CNN acoustic models, we propose the multiresolution CNN that has two separate streams: one is the wideband feature with wide-context window and the other is the narrowband feature with narrow-context window. The experiments on the ASR task of the REVERB challenge 2014 showed that the proposed multiresolution CNN based approach reduced the WER by 8.79% and 8.83% for the simulated test data and the real-condition test data, respectively, compared with the conventional CNN based method.","PeriodicalId":255147,"journal":{"name":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSDA.2017.8384470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18

Abstract

The performance of automatic speech recognition (ASR) has been greatly improved by deep neural network (DNN) acoustic models. However, DNN-based systems still perform poorly in reverberant environments. Convolutional neural network (CNN) acoustic models showed lower word error rate (WER) in distant speech recognition than fully-connected DNN acoustic models. To improve the performance of reverberant speech recognition using CNN acoustic models, we propose the multiresolution CNN that has two separate streams: one is the wideband feature with wide-context window and the other is the narrowband feature with narrow-context window. The experiments on the ASR task of the REVERB challenge 2014 showed that the proposed multiresolution CNN based approach reduced the WER by 8.79% and 8.83% for the simulated test data and the real-condition test data, respectively, compared with the conventional CNN based method.
多分辨率CNN混响语音识别
深度神经网络声学模型极大地提高了自动语音识别(ASR)的性能。然而,基于dnn的系统在混响环境中仍然表现不佳。卷积神经网络(CNN)声学模型在远端语音识别中的单词错误率(WER)低于全连接DNN声学模型。为了提高使用CNN声学模型进行混响语音识别的性能,我们提出了具有两个独立流的多分辨率CNN:一个是具有宽上下文窗口的宽带特征,另一个是具有窄上下文窗口的窄带特征。在REVERB challenge 2014的ASR任务上进行的实验表明,与传统的基于CNN的方法相比,本文提出的基于多分辨率CNN的方法对模拟测试数据和真实条件测试数据的WER分别降低了8.79%和8.83%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信