Hellinger Distance Based Conditional Variational Auto-Encoder and Its Application in Raw Audio Generation

2018 IEEE 18th International Conference on Communication Technology (ICCT) Pub Date : 2018-10-01 DOI:10.1109/ICCT.2018.8600275

Yutian Wang, T. Hu, Zheng Wang, Juanjuan Cai, Huaichang Du

引用次数: 0

Abstract

Nowadays, audio generation plays an important role in human-computer interactive applications. However, the audio generated by machine is far from nature sound, especially in expressiveness and complexity. Currently, conditional variational Auto-encoder (cVAE) has achieved excellent results in data generation, but original cVAE cannot avoid the defects caused by KL divergence which used in stochastic distribution measurement. This paper introduced Hellinger distance into cVAE model. First of all, the experiment shows that using Hellinger distance can improve the weakness of KL divergence effectively. And then, the relationship between the latent space parameters and the generated music quality is analyzed by experiments, and we found the best generative parameter is the distribution centroid. Finally, the generated music is subjectively evaluated and the results show that it is significantly better than the original model.

查看原文本刊更多论文

基于Hellinger距离的条件变分自编码器及其在原始音频生成中的应用

当前，音频生成在人机交互应用中起着重要的作用。然而，机器产生的声音与自然声音相差甚远，尤其是在表现力和复杂性方面。目前，条件变分自编码器(conditional variational Auto-encoder, cVAE)在数据生成方面取得了优异的成绩，但原有的cVAE无法避免在随机分布测量中使用KL散度所带来的缺陷。本文将海灵格距离引入cVAE模型。首先，实验表明，利用海灵格距离可以有效地改善KL散度的弱点。然后，通过实验分析了潜在空间参数与生成的音质之间的关系，发现最佳生成参数为分布质心。最后，对生成的音乐进行主观评价，结果表明生成的音乐明显优于原始模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 18th International Conference on Communication Technology (ICCT)

自引率

0.00%

发文量