FRE-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis

Sang-Hoon Lee, Ji-Hoon Kim, Kangeun Lee, Seong-Whan Lee
{"title":"FRE-GAN 2: Fast and Efficient Frequency-Consistent Audio Synthesis","authors":"Sang-Hoon Lee, Ji-Hoon Kim, Kangeun Lee, Seong-Whan Lee","doi":"10.1109/icassp43922.2022.9746675","DOIUrl":null,"url":null,"abstract":"Although recent advances in neural vocoder have shown significant improvement, most of these models have a trade-off between audio quality and computational complexity. Since the large model has a limitation on the low-resource devices, a more efficient neural vocoder should synthesize high-quality audio for practical applicability. In this paper, we present Fre-GAN 2, a fast and efficient high-quality audio synthesis model. For fast synthesis, Fre-GAN 2 only synthesizes low and high-frequency parts of the audio, and we leverage the inverse discrete wavelet transform to reproduce the target-resolution audio in the generator. Additionally, we also introduce adversarial periodic feature distillation, which makes the model synthesize high-quality audio with only a small parameter. The experimental results show the superiority of Fre-GAN 2 in audio quality. Furthermore, Fre-GAN 2 has a 10.91× generation acceleration, and the parameters are compressed by 21.23× than Fre-GAN.","PeriodicalId":272439,"journal":{"name":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icassp43922.2022.9746675","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Although recent advances in neural vocoder have shown significant improvement, most of these models have a trade-off between audio quality and computational complexity. Since the large model has a limitation on the low-resource devices, a more efficient neural vocoder should synthesize high-quality audio for practical applicability. In this paper, we present Fre-GAN 2, a fast and efficient high-quality audio synthesis model. For fast synthesis, Fre-GAN 2 only synthesizes low and high-frequency parts of the audio, and we leverage the inverse discrete wavelet transform to reproduce the target-resolution audio in the generator. Additionally, we also introduce adversarial periodic feature distillation, which makes the model synthesize high-quality audio with only a small parameter. The experimental results show the superiority of Fre-GAN 2 in audio quality. Furthermore, Fre-GAN 2 has a 10.91× generation acceleration, and the parameters are compressed by 21.23× than Fre-GAN.
快速高效的频率一致音频合成
尽管神经声码器的最新进展显示出显著的进步,但大多数这些模型都需要在音频质量和计算复杂性之间进行权衡。由于大模型对低资源设备的限制,一个更高效的神经声码器应该合成高质量的音频,以便于实际应用。本文提出了一种快速、高效的高质量音频合成模型——frei - gan 2。为了快速合成,fr - gan 2仅合成音频的低频和高频部分,并且我们利用逆离散小波变换在发生器中再现目标分辨率音频。此外,我们还引入了对抗性的周期特征蒸馏,使模型能够以很小的参数合成高质量的音频。实验结果表明,Fre-GAN 2在音频质量方面具有优势。此外,与free - gan相比,free - gan 2的生成速度加快了10.91倍,参数压缩了21.23倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信