基于卷积神经网络的单通道语音增强

Tomás Kounovský, J. Málek
{"title":"基于卷积神经网络的单通道语音增强","authors":"Tomás Kounovský, J. Málek","doi":"10.1109/ECMSM.2017.7945915","DOIUrl":null,"url":null,"abstract":"Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.","PeriodicalId":358140,"journal":{"name":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":"{\"title\":\"Single channel speech enhancement using convolutional neural network\",\"authors\":\"Tomás Kounovský, J. Málek\",\"doi\":\"10.1109/ECMSM.2017.7945915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.\",\"PeriodicalId\":358140,\"journal\":{\"name\":\"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"55\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECMSM.2017.7945915\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECMSM.2017.7945915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 55

摘要

神经网络可以用来识别和去除噪声的语音频谱(去噪自动编码器,DAEs)。dae通常使用全连接前馈拓扑来实现。通常使用以下两种可能性中的一种作为数据分析的目标:1)理想频比掩模,将其应用于噪声频谱以估计干净的语音频谱(掩模)或2)直接清洁的语音频谱(映射)。近年来在自动语音识别领域的研究表明,卷积神经网络在语音建模方面非常有前途。因此,在本文中,我们建议用卷积拓扑构造DAs。我们研究了上述两种目标类型的适用性,并将结果与完全连接的da进行了比较。我们的实验表明,基于映射的卷积网络估计对数功率谱在所有竞争拓扑和目标类型上都取得了显着的改进。在理想的比例掩蔽中,PESQ分数的性能提高了8%。我们还研究了基于训练集语言多样性的DAEs增强未见语言语音的能力。我们的实验表明,使用不同语言的训练集训练DAEs对语音增强任务没有任何显著的好处。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Single channel speech enhancement using convolutional neural network
Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信