Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement

J. Hung, Shu-Ting Tsai, Yan-Tong Chen
{"title":"Exploiting the Non-Uniform Frequency-Resolution Spectrograms to Improve the Deep Denoising Auto-Encoder for Speech Enhancement","authors":"J. Hung, Shu-Ting Tsai, Yan-Tong Chen","doi":"10.1109/ICASI52993.2021.9568478","DOIUrl":null,"url":null,"abstract":"This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.","PeriodicalId":103254,"journal":{"name":"2021 7th International Conference on Applied System Innovation (ICASI)","volume":"132 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 7th International Conference on Applied System Innovation (ICASI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICASI52993.2021.9568478","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This study focuses on improving the deep denoising autoencoder (DDAE) for speech enhancement by reducing the size of its input feature. DDAE is a well-known deep learning structure that learns the mapping from the noisy signal to the clean noise-free counterpart. One of the most commonly used representative for the input signal used to train the DDAE is the spectrogram, which is the ordered series of the short-time Fourier transform (STFT) of each frame for the input signal. In this study, we examine a variant of the spectrogram as the input to a DDAE, which possesses a non-uniform acoustic frequency resolution and thus downscales the original spectrogram. Stating in more details, we decompose the original full-resolution spectrogram into four sub-bands, and then down-sample the sub-band spectral points in turn. The higher frequencies the sub-band has, the greater decimation factor it gets. The overall spectral drop rate is around 50%. The preliminary experiments conducted on the utterances corrupted by various noise types (babble, babycry, car, engine and white) reveal that halving the input spectral points with the non-uniform sampling can benefit the learned DDAE to provide higher speech quality and intelligibility of the test signals. Therefore, this new method can improve the denoising performance of the DDAE as well as reduce its computation complexity.
利用非均匀频率分辨率谱图改进语音增强深度去噪自编码器
本研究主要针对深度去噪自动编码器(DDAE)的语音增强问题,通过减小其输入特征的大小来进行改进。DDAE是一种众所周知的深度学习结构,它学习从有噪声信号到干净无噪声信号的映射。用于训练DDAE的输入信号最常用的代表之一是频谱图,它是输入信号每帧的短时傅里叶变换(STFT)的有序序列。在本研究中,我们研究了频谱图的变体作为DDAE的输入,DDAE具有非均匀的声学频率分辨率,从而缩小了原始频谱图。详细地说,我们将原始的全分辨率光谱图分解成四个子波段,然后依次对子波段的光谱点进行下采样。子频带的频率越高,抽取因子就越大。整体光谱下降率约为50%。通过对各种噪声类型(咿呀学语、婴儿啼哭、汽车噪声、发动机噪声和白色噪声)干扰的语音进行初步实验,发现采用非均匀采样将输入频谱点减半有利于学习后的DDAE提供更高的语音质量和测试信号的可理解性。因此,该方法在提高DDAE去噪性能的同时,降低了DDAE的计算复杂度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信