Time-Frequency Loss for CNN Based Speech Super-Resolution

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2020-05-01 DOI:10.1109/ICASSP40776.2020.9053712

Heming Wang, Deliang Wang

引用次数: 17

Abstract

Speech super-resolution (SR), also called speech bandwidth extension (BWE), aims to increase the sampling rate of a given lower resolution speech signal. Recent years have witnessed the successful application of deep neural networks in time or frequency domains, and deep learning has improved the performance considerably compared with conventional approaches. This paper proposes an autoencoder based fully convolutional neural network (CNN) that merges the information from both time and frequency domains. At the training time, we optimize the CNN using a new time-frequency loss (T-F loss), which combines a time domain loss and a frequency domain loss. The experimental results show that our model trained with the T-F loss achieves significantly better results than other state-of-the-art models, and yields balanced performance in terms of time and frequency metrics.

查看原文本刊更多论文

基于CNN的语音超分辨率时频损失

语音超分辨率(SR)也称为语音带宽扩展(BWE)，其目的是提高给定低分辨率语音信号的采样率。本文提出了一种基于自编码器的全卷积神经网络(CNN)，该网络融合了时域和频域的信息。在训练时，我们使用一种新的时频损失(T-F损失)来优化CNN，它结合了时域损失和频域损失。实验结果表明，使用T-F损失训练的模型比其他最先进的模型取得了明显更好的结果，并且在时间和频率指标方面产生了平衡的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量