{"title":"Single channel speech enhancement using convolutional neural network","authors":"Tomás Kounovský, J. Málek","doi":"10.1109/ECMSM.2017.7945915","DOIUrl":null,"url":null,"abstract":"Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.","PeriodicalId":358140,"journal":{"name":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECMSM.2017.7945915","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 55
Abstract
Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum directly (mapping). Recent research in the area of automatic speech recognition shows that convolutional neural networks are very promising in speech modeling. In this paper we thus suggest, construct the DAs with the convolutional topology. We investigate the suitability of both above described target types and compare the results with the fully connected DAs. Our experiments indicate that mapping-based convolutional networks estimating log-power spectra achieve significant improvement over all competing topologies and target types. Performance gains of 8% in PESQ scores over the ideal ratio masking are observed. We also investigate the ability of DAEs to enhance speech of unseen language based on the language diversity of the training set. Our experiments suggest that training DAEs with language-diverse training sets does not yield any significant benefit for the task of speech enhancement.