基于卷积神经网络的单通道语音增强

2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM) Pub Date : 2017-05-01 DOI:10.1109/ECMSM.2017.7945915

Tomás Kounovský, J. Málek

{"title":"基于卷积神经网络的单通道语音增强","authors":"Tomás Kounovský, J. Málek","doi":"10.1109/ECMSM.2017.7945915","DOIUrl":null,"url":null,"abstract":"Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.","PeriodicalId":358140,"journal":{"name":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":"{\"title\":\"Single channel speech enhancement using convolutional neural network\",\"authors\":\"Tomás Kounovský, J. Málek\",\"doi\":\"10.1109/ECMSM.2017.7945915\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.\",\"PeriodicalId\":358140,\"journal\":{\"name\":\"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"55\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECMSM.2017.7945915\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECMSM.2017.7945915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

摘要

神经网络可以用来识别和去除噪声的语音频谱(去噪自动编码器，DAEs)。dae通常使用全连接前馈拓扑来实现。通常使用以下两种可能性中的一种作为数据分析的目标:1)理想频比掩模，将其应用于噪声频谱以估计干净的语音频谱(掩模)或2)直接清洁的语音频谱(映射)。近年来在自动语音识别领域的研究表明，卷积神经网络在语音建模方面非常有前途。因此，在本文中，我们建议用卷积拓扑构造DAs。我们研究了上述两种目标类型的适用性，并将结果与完全连接的da进行了比较。我们的实验表明，基于映射的卷积网络估计对数功率谱在所有竞争拓扑和目标类型上都取得了显着的改进。在理想的比例掩蔽中，PESQ分数的性能提高了8%。我们还研究了基于训练集语言多样性的DAEs增强未见语言语音的能力。我们的实验表明，使用不同语言的训练集训练DAEs对语音增强任务没有任何显著的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Single channel speech enhancement using convolutional neural network

Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)

自引率

0.00%

发文量