Audio Future Block Prediction with Conditional Generative Adversarial Network

2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE) Pub Date : 2019-12-26 DOI:10.1109/ICECTE48615.2019.9303563

Md. Rahat-uz-Zaman, Shadmaan Hye, Mahmudul Hasan

{"title":"Audio Future Block Prediction with Conditional Generative Adversarial Network","authors":"Md. Rahat-uz-Zaman, Shadmaan Hye, Mahmudul Hasan","doi":"10.1109/ICECTE48615.2019.9303563","DOIUrl":null,"url":null,"abstract":"Signal processing is a vast subfield of electrical and computer science where audio signal processing has secured a remarkable position to restore corrupted or missing audio blocks. However, generating possible future audio block from the previous audio block is still a new idea that can help to reduce both audio noise and partially missing an audio segment. In this paper, a generative adversarial network (GAN) along with a pipeline is proposed for the prediction of possible audio after an input audio sequence. The proposed model uses short-time Fourier transformation of audio to make it an image. The image is then fed to a conditional GAN to predict the output image. After that, Inverse short-time Fourier transform is then applied to that predicted image, generating the predicted audio sequence. For a small audio sequence prediction, the proposed methodology is quite fast, robust and has achieved a loss of 0.43. So it is may work well if deployed on a voice call and broadcasting applications.","PeriodicalId":320507,"journal":{"name":"2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE)","volume":"42 17","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICECTE48615.2019.9303563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Signal processing is a vast subfield of electrical and computer science where audio signal processing has secured a remarkable position to restore corrupted or missing audio blocks. However, generating possible future audio block from the previous audio block is still a new idea that can help to reduce both audio noise and partially missing an audio segment. In this paper, a generative adversarial network (GAN) along with a pipeline is proposed for the prediction of possible audio after an input audio sequence. The proposed model uses short-time Fourier transformation of audio to make it an image. The image is then fed to a conditional GAN to predict the output image. After that, Inverse short-time Fourier transform is then applied to that predicted image, generating the predicted audio sequence. For a small audio sequence prediction, the proposed methodology is quite fast, robust and has achieved a loss of 0.43. So it is may work well if deployed on a voice call and broadcasting applications.

查看原文本刊更多论文

基于条件生成对抗网络的音频未来块预测

信号处理是电子和计算机科学的一个巨大的子领域，其中音频信号处理在恢复损坏或丢失的音频块方面取得了显著的地位。然而，从之前的音频块生成可能的未来音频块仍然是一个新的想法，可以帮助减少音频噪声和部分缺失的音频段。本文提出了一种带有管道的生成对抗网络(GAN)，用于预测输入音频序列后可能出现的音频。该模型对音频进行短时傅里叶变换，使其成为图像。然后将图像馈送到条件GAN以预测输出图像。然后，对预测的图像进行短时间傅里叶反变换，生成预测的音频序列。对于较小的音频序列预测，该方法具有较快的鲁棒性，损失为0.43。因此，如果部署在语音通话和广播应用程序上，它可能会工作得很好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 3rd International Conference on Electrical, Computer & Telecommunication Engineering (ICECTE)

自引率

0.00%

发文量