Investigating Training Objectives for Generative Speech Enhancement

arXiv - EE - Audio and Speech Processing Pub Date : 2024-09-16 DOI:arxiv-2409.10753

Julius Richter, Danilo de Oliveira, Timo Gerkmann

引用次数: 0

Abstract

Generative speech enhancement has recently shown promising advancements in improving speech quality in noisy environments. Multiple diffusion-based frameworks exist, each employing distinct training objectives and learning techniques. This paper aims at explaining the differences between these frameworks by focusing our investigation on score-based generative models and Schr\"odinger bridge. We conduct a series of comprehensive experiments to compare their performance and highlight differing training behaviors. Furthermore, we propose a novel perceptual loss function tailored for the Schr\"odinger bridge framework, demonstrating enhanced performance and improved perceptual quality of the enhanced speech signals. All experimental code and pre-trained models are publicly available to facilitate further research and development in this.

查看原文本刊更多论文

研究生成式语音增强的训练目标

最近，生成语音增强技术在改善嘈杂环境下的语音质量方面取得了可喜的进步。目前存在多种基于扩散的框架，每种框架都采用了不同的训练目标和学习技术。本文旨在通过重点研究基于分数的生成模型和薛定谔桥来解释这些框架之间的差异。此外，我们还提出了一种为薛定谔桥框架量身定制的新型感知损失函数，证明了增强语音信号的性能和感知质量。所有实验代码和预先训练的模型都是公开的，以促进这方面的进一步研究和开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - EE - Audio and Speech Processing

自引率

0.00%

发文量