基于UNet++的多通道语音去噪与远程语音识别

Tuo Zhao, Yunxin Zhao, Shaojun Wang, Mei Han
{"title":"基于UNet++的多通道语音去噪与远程语音识别","authors":"Tuo Zhao, Yunxin Zhao, Shaojun Wang, Mei Han","doi":"10.1109/ISCSLP49672.2021.9362064","DOIUrl":null,"url":null,"abstract":"We propose a novel approach of using a newly appeared fully convolutional network (FCN) architecture, UNet++, for multichannel speech dereverberation and distant speech recognition (DSR). While the previous FCN architecture UNet is good at utilizing time-frequency structures of speech, UNet++ offers better robustness in network depths and skip connections. For DSR, UNet++ serves as a feature enhancement front-end, and the enhanced speech features are used for acoustic model training and recognition. We also propose a frequency-dependent convolution scheme (FDCS), resulting in new variants of UNet and UNet++. We present DSR results from the multiple distant microphone (MDM) datasets of AMI meeting corpus, and compare the performance of UNet++ with UNet and weighted prediction error (WPE). Our results demonstrate that for DSR, the UNet++-based approaches provide large word error rate (WER) reductions over its UNetand WPE-based counterparts. The UNet++ with WPE preprocessing and 4-channel input achieves the lowest WERs. The dereverberation results are also measured by speech-to-dereverberation modulation energy ratio (SRMR), from which large gains of UNet++ over UNet and WPE are also observed.","PeriodicalId":279828,"journal":{"name":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"UNet++-Based Multi-Channel Speech Dereverberation and Distant Speech Recognition\",\"authors\":\"Tuo Zhao, Yunxin Zhao, Shaojun Wang, Mei Han\",\"doi\":\"10.1109/ISCSLP49672.2021.9362064\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a novel approach of using a newly appeared fully convolutional network (FCN) architecture, UNet++, for multichannel speech dereverberation and distant speech recognition (DSR). While the previous FCN architecture UNet is good at utilizing time-frequency structures of speech, UNet++ offers better robustness in network depths and skip connections. For DSR, UNet++ serves as a feature enhancement front-end, and the enhanced speech features are used for acoustic model training and recognition. We also propose a frequency-dependent convolution scheme (FDCS), resulting in new variants of UNet and UNet++. We present DSR results from the multiple distant microphone (MDM) datasets of AMI meeting corpus, and compare the performance of UNet++ with UNet and weighted prediction error (WPE). Our results demonstrate that for DSR, the UNet++-based approaches provide large word error rate (WER) reductions over its UNetand WPE-based counterparts. The UNet++ with WPE preprocessing and 4-channel input achieves the lowest WERs. The dereverberation results are also measured by speech-to-dereverberation modulation energy ratio (SRMR), from which large gains of UNet++ over UNet and WPE are also observed.\",\"PeriodicalId\":279828,\"journal\":{\"name\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP49672.2021.9362064\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 12th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP49672.2021.9362064","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

我们提出了一种使用新出现的全卷积网络(FCN)架构UNet++的新方法,用于多通道语音去噪和远程语音识别(DSR)。虽然以前的FCN架构UNet擅长利用语音的时频结构,但UNet++在网络深度和跳过连接方面提供了更好的鲁棒性。对于DSR, UNet++作为特征增强前端,增强的语音特征用于声学模型训练和识别。我们还提出了一种频率相关的卷积方案(FDCS),从而产生了UNet和unet++的新变体。我们给出了AMI会议语料库的多个远程麦克风(MDM)数据集的DSR结果,并比较了UNet++与UNet和加权预测误差(WPE)的性能。我们的结果表明,对于DSR,基于UNet++的方法比基于UNet和wpe的方法提供了更大的单词错误率(WER)降低。采用WPE预处理和4通道输入的unet++实现了最低的wwe。通过语音-去噪调制能量比(SRMR)也测量了去噪结果,从中还观察到UNet++比UNet和WPE有较大的增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
UNet++-Based Multi-Channel Speech Dereverberation and Distant Speech Recognition
We propose a novel approach of using a newly appeared fully convolutional network (FCN) architecture, UNet++, for multichannel speech dereverberation and distant speech recognition (DSR). While the previous FCN architecture UNet is good at utilizing time-frequency structures of speech, UNet++ offers better robustness in network depths and skip connections. For DSR, UNet++ serves as a feature enhancement front-end, and the enhanced speech features are used for acoustic model training and recognition. We also propose a frequency-dependent convolution scheme (FDCS), resulting in new variants of UNet and UNet++. We present DSR results from the multiple distant microphone (MDM) datasets of AMI meeting corpus, and compare the performance of UNet++ with UNet and weighted prediction error (WPE). Our results demonstrate that for DSR, the UNet++-based approaches provide large word error rate (WER) reductions over its UNetand WPE-based counterparts. The UNet++ with WPE preprocessing and 4-channel input achieves the lowest WERs. The dereverberation results are also measured by speech-to-dereverberation modulation energy ratio (SRMR), from which large gains of UNet++ over UNet and WPE are also observed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信