{"title":"Speech Enhancement Using Variational Autoencoders","authors":"A. Punnoose","doi":"10.1109/IConSCEPT57958.2023.10170608","DOIUrl":null,"url":null,"abstract":"This paper discusses the experimental details of speech enhancement using variational autoencoders (VAE). A joint VAE architecture is formulated, and a training protocol that strikes a balance between speech enhancement and VAE correctness is defined. Extended short-term objective intelligibility (ESTOI) is used to measure the intelligibility of enhanced speech. The proposed approach is implemented using MFCC and STFT features on a benchmark dataset and we report, on an average, 2 times improvement in ESTOI for enhanced speech using MFCC over STFT features across all noise levels. Further, the proposed approach using MFCC features shows significant improvement in denoising very noisy speech, as opposed to marginal improvement on relatively clean speech.","PeriodicalId":240167,"journal":{"name":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IConSCEPT57958.2023.10170608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper discusses the experimental details of speech enhancement using variational autoencoders (VAE). A joint VAE architecture is formulated, and a training protocol that strikes a balance between speech enhancement and VAE correctness is defined. Extended short-term objective intelligibility (ESTOI) is used to measure the intelligibility of enhanced speech. The proposed approach is implemented using MFCC and STFT features on a benchmark dataset and we report, on an average, 2 times improvement in ESTOI for enhanced speech using MFCC over STFT features across all noise levels. Further, the proposed approach using MFCC features shows significant improvement in denoising very noisy speech, as opposed to marginal improvement on relatively clean speech.