Yutian Wang, T. Hu, Zheng Wang, Juanjuan Cai, Huaichang Du
{"title":"Hellinger Distance Based Conditional Variational Auto-Encoder and Its Application in Raw Audio Generation","authors":"Yutian Wang, T. Hu, Zheng Wang, Juanjuan Cai, Huaichang Du","doi":"10.1109/ICCT.2018.8600275","DOIUrl":null,"url":null,"abstract":"Nowadays, audio generation plays an important role in human-computer interactive applications. However, the audio generated by machine is far from nature sound, especially in expressiveness and complexity. Currently, conditional variational Auto-encoder (cVAE) has achieved excellent results in data generation, but original cVAE cannot avoid the defects caused by KL divergence which used in stochastic distribution measurement. This paper introduced Hellinger distance into cVAE model. First of all, the experiment shows that using Hellinger distance can improve the weakness of KL divergence effectively. And then, the relationship between the latent space parameters and the generated music quality is analyzed by experiments, and we found the best generative parameter is the distribution centroid. Finally, the generated music is subjectively evaluated and the results show that it is significantly better than the original model.","PeriodicalId":244952,"journal":{"name":"2018 IEEE 18th International Conference on Communication Technology (ICCT)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 18th International Conference on Communication Technology (ICCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCT.2018.8600275","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Nowadays, audio generation plays an important role in human-computer interactive applications. However, the audio generated by machine is far from nature sound, especially in expressiveness and complexity. Currently, conditional variational Auto-encoder (cVAE) has achieved excellent results in data generation, but original cVAE cannot avoid the defects caused by KL divergence which used in stochastic distribution measurement. This paper introduced Hellinger distance into cVAE model. First of all, the experiment shows that using Hellinger distance can improve the weakness of KL divergence effectively. And then, the relationship between the latent space parameters and the generated music quality is analyzed by experiments, and we found the best generative parameter is the distribution centroid. Finally, the generated music is subjectively evaluated and the results show that it is significantly better than the original model.