Hellinger Distance Based Conditional Variational Auto-Encoder and Its Application in Raw Audio Generation

Yutian Wang, T. Hu, Zheng Wang, Juanjuan Cai, Huaichang Du
{"title":"Hellinger Distance Based Conditional Variational Auto-Encoder and Its Application in Raw Audio Generation","authors":"Yutian Wang, T. Hu, Zheng Wang, Juanjuan Cai, Huaichang Du","doi":"10.1109/ICCT.2018.8600275","DOIUrl":null,"url":null,"abstract":"Nowadays, audio generation plays an important role in human-computer interactive applications. However, the audio generated by machine is far from nature sound, especially in expressiveness and complexity. Currently, conditional variational Auto-encoder (cVAE) has achieved excellent results in data generation, but original cVAE cannot avoid the defects caused by KL divergence which used in stochastic distribution measurement. This paper introduced Hellinger distance into cVAE model. First of all, the experiment shows that using Hellinger distance can improve the weakness of KL divergence effectively. And then, the relationship between the latent space parameters and the generated music quality is analyzed by experiments, and we found the best generative parameter is the distribution centroid. Finally, the generated music is subjectively evaluated and the results show that it is significantly better than the original model.","PeriodicalId":244952,"journal":{"name":"2018 IEEE 18th International Conference on Communication Technology (ICCT)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Communication Technology (ICCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCT.2018.8600275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays, audio generation plays an important role in human-computer interactive applications. However, the audio generated by machine is far from nature sound, especially in expressiveness and complexity. Currently, conditional variational Auto-encoder (cVAE) has achieved excellent results in data generation, but original cVAE cannot avoid the defects caused by KL divergence which used in stochastic distribution measurement. This paper introduced Hellinger distance into cVAE model. First of all, the experiment shows that using Hellinger distance can improve the weakness of KL divergence effectively. And then, the relationship between the latent space parameters and the generated music quality is analyzed by experiments, and we found the best generative parameter is the distribution centroid. Finally, the generated music is subjectively evaluated and the results show that it is significantly better than the original model.
基于Hellinger距离的条件变分自编码器及其在原始音频生成中的应用
当前,音频生成在人机交互应用中起着重要的作用。然而,机器产生的声音与自然声音相差甚远,尤其是在表现力和复杂性方面。目前,条件变分自编码器(conditional variational Auto-encoder, cVAE)在数据生成方面取得了优异的成绩,但原有的cVAE无法避免在随机分布测量中使用KL散度所带来的缺陷。本文将海灵格距离引入cVAE模型。首先,实验表明,利用海灵格距离可以有效地改善KL散度的弱点。然后,通过实验分析了潜在空间参数与生成的音质之间的关系,发现最佳生成参数为分布质心。最后,对生成的音乐进行主观评价,结果表明生成的音乐明显优于原始模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信