{"title":"通过U-net迭代前向传递的语音增强","authors":"Tomasz Grzywalski, S. Drgas","doi":"10.23919/spa50552.2020.9241307","DOIUrl":null,"url":null,"abstract":"In recent years speech enhancement has shown great progress that was driven mostly by using bigger and more sophisticated neural networks. In this work we investigate the possibility to use state-of-the-art speech enhancement neural network and modify it in such a way that will allow it to process the noisy signal multiple times. By doing so we expect, that with each iteration the enhancement will improve. Experiments conducted using the WSJ0, Noisex-92 and DCASE datasets show, that U-net with gated dilated convolutions is able to achieve better SI-SDR, STOI and PESQ after processing the noisy signal two times, with the improvement being consistent across all SNRs and tested noise types. This is achieved without any additional trainable parameters and no additional memory requirements compared to the baseline model.","PeriodicalId":157578,"journal":{"name":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Speech enhancement by iterating forward pass through U-net\",\"authors\":\"Tomasz Grzywalski, S. Drgas\",\"doi\":\"10.23919/spa50552.2020.9241307\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years speech enhancement has shown great progress that was driven mostly by using bigger and more sophisticated neural networks. In this work we investigate the possibility to use state-of-the-art speech enhancement neural network and modify it in such a way that will allow it to process the noisy signal multiple times. By doing so we expect, that with each iteration the enhancement will improve. Experiments conducted using the WSJ0, Noisex-92 and DCASE datasets show, that U-net with gated dilated convolutions is able to achieve better SI-SDR, STOI and PESQ after processing the noisy signal two times, with the improvement being consistent across all SNRs and tested noise types. This is achieved without any additional trainable parameters and no additional memory requirements compared to the baseline model.\",\"PeriodicalId\":157578,\"journal\":{\"name\":\"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/spa50552.2020.9241307\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/spa50552.2020.9241307","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech enhancement by iterating forward pass through U-net
In recent years speech enhancement has shown great progress that was driven mostly by using bigger and more sophisticated neural networks. In this work we investigate the possibility to use state-of-the-art speech enhancement neural network and modify it in such a way that will allow it to process the noisy signal multiple times. By doing so we expect, that with each iteration the enhancement will improve. Experiments conducted using the WSJ0, Noisex-92 and DCASE datasets show, that U-net with gated dilated convolutions is able to achieve better SI-SDR, STOI and PESQ after processing the noisy signal two times, with the improvement being consistent across all SNRs and tested noise types. This is achieved without any additional trainable parameters and no additional memory requirements compared to the baseline model.