{"title":"The effect of reducing the acoustic-frequency resolution for spectrograms used in deep denoising auto-encoder","authors":"Yan-Tong Chen, Shu-Ting Tsai, J. Hung","doi":"10.1109/ICCE-TW52618.2021.9602986","DOIUrl":null,"url":null,"abstract":"In this study, we investigate the effect of varying the acoustic-frequency resolution of the spectrogram for the input signals of the deep denoising auto-encoder (DDAE). DDAE is a well-known deep learning structure that learns the relationship between the noisy signal and the respective clean noise-free one. The most commonly used representative for the input signal used to train the DDAE might be the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this paper, we attempt to reduce the acoustic-frequency resolution of the STFT to see its effect of the learned DDAE in terms of the quality and intelligibility of the output signals. The preliminary experimental results indicate that halving the input frequency points (i.e., reducing the frequency resolution by a factor of 2) provides the learned DDAE with almost the same speech quality and intelligibility, while it helps to down-scale the input feature as well as reduce the computation complexity of the DDAE.","PeriodicalId":141850,"journal":{"name":"2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW)","volume":"9 5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Consumer Electronics-Taiwan (ICCE-TW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE-TW52618.2021.9602986","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this study, we investigate the effect of varying the acoustic-frequency resolution of the spectrogram for the input signals of the deep denoising auto-encoder (DDAE). DDAE is a well-known deep learning structure that learns the relationship between the noisy signal and the respective clean noise-free one. The most commonly used representative for the input signal used to train the DDAE might be the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this paper, we attempt to reduce the acoustic-frequency resolution of the STFT to see its effect of the learned DDAE in terms of the quality and intelligibility of the output signals. The preliminary experimental results indicate that halving the input frequency points (i.e., reducing the frequency resolution by a factor of 2) provides the learned DDAE with almost the same speech quality and intelligibility, while it helps to down-scale the input feature as well as reduce the computation complexity of the DDAE.