{"title":"相位在基于深度神经网络的语音增强算法中的意义","authors":"P. Rani, Sivaganesh Andhavarapu, S. Kodukula","doi":"10.1109/NCC48643.2020.9056089","DOIUrl":null,"url":null,"abstract":"Most of the speech enhancement algorithms rely on estimating the magnitude spectrum of the clean speech signal from that of the noisy speech signal using either spectral regression or spectral masking. Because of difficulty in processing the phase of the short time Fourier transform (STFT), noisy phase is reused while synthesizing the waveform from the enhanced magnitude spectrum. In order to demonstrate the significance of phase in speech enhancement, we compare the phase obtained from different reconstruction methods, like Griffin and Lim, minimum phase, with that of the gold phase (clean phase). In this work, spectral magnitude mask (SMM) is estimated using deep neural networks to enhance the magnitude spectrum of the speech signal. The experimental results showed that gold phase outperforms the phase reconstruction methods in all the objective measures, illustrating the significance of enhancing the noisy phase in speech enhancement.","PeriodicalId":183772,"journal":{"name":"2020 National Conference on Communications (NCC)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Significance of Phase in DNN based speech enhancement algorithms\",\"authors\":\"P. Rani, Sivaganesh Andhavarapu, S. Kodukula\",\"doi\":\"10.1109/NCC48643.2020.9056089\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most of the speech enhancement algorithms rely on estimating the magnitude spectrum of the clean speech signal from that of the noisy speech signal using either spectral regression or spectral masking. Because of difficulty in processing the phase of the short time Fourier transform (STFT), noisy phase is reused while synthesizing the waveform from the enhanced magnitude spectrum. In order to demonstrate the significance of phase in speech enhancement, we compare the phase obtained from different reconstruction methods, like Griffin and Lim, minimum phase, with that of the gold phase (clean phase). In this work, spectral magnitude mask (SMM) is estimated using deep neural networks to enhance the magnitude spectrum of the speech signal. The experimental results showed that gold phase outperforms the phase reconstruction methods in all the objective measures, illustrating the significance of enhancing the noisy phase in speech enhancement.\",\"PeriodicalId\":183772,\"journal\":{\"name\":\"2020 National Conference on Communications (NCC)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 National Conference on Communications (NCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NCC48643.2020.9056089\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 National Conference on Communications (NCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NCC48643.2020.9056089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Significance of Phase in DNN based speech enhancement algorithms
Most of the speech enhancement algorithms rely on estimating the magnitude spectrum of the clean speech signal from that of the noisy speech signal using either spectral regression or spectral masking. Because of difficulty in processing the phase of the short time Fourier transform (STFT), noisy phase is reused while synthesizing the waveform from the enhanced magnitude spectrum. In order to demonstrate the significance of phase in speech enhancement, we compare the phase obtained from different reconstruction methods, like Griffin and Lim, minimum phase, with that of the gold phase (clean phase). In this work, spectral magnitude mask (SMM) is estimated using deep neural networks to enhance the magnitude spectrum of the speech signal. The experimental results showed that gold phase outperforms the phase reconstruction methods in all the objective measures, illustrating the significance of enhancing the noisy phase in speech enhancement.