One-Shot Voice Conversion by Vector Quantization

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) Pub Date : 2020-05-01 DOI:10.1109/ICASSP40776.2020.9053854

Da-Yi Wu, Hung-yi Lee

引用次数: 64

Abstract

In this paper, we propose a vector quantization (VQ) based one-shot voice conversion (VC) approach without any supervision on speaker label. We model the content embedding as a series of discrete codes and take the difference between quantize-before and quantize-after vector as the speaker embedding. We show that this approach has a strong ability to disentangle the content and speaker information with reconstruction loss only, and one-shot VC is thus achieved.

查看原文本刊更多论文

基于矢量量化的单次语音转换

本文提出了一种基于矢量量化(VQ)的单次语音转换(VC)方法，该方法无需对说话人标签进行任何监督。我们将内容嵌入建模为一系列离散码，并将量化前向量和量化后向量的差值作为说话人嵌入。我们的研究表明，该方法具有较强的分离内容和说话人信息的能力，并且只有重建损失，从而实现了一次VC。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

自引率

0.00%

发文量