Speech Enhancement Using Variational Autoencoders

A. Punnoose
{"title":"Speech Enhancement Using Variational Autoencoders","authors":"A. Punnoose","doi":"10.1109/IConSCEPT57958.2023.10170608","DOIUrl":null,"url":null,"abstract":"This paper discusses the experimental details of speech enhancement using variational autoencoders (VAE). A joint VAE architecture is formulated, and a training protocol that strikes a balance between speech enhancement and VAE correctness is defined. Extended short-term objective intelligibility (ESTOI) is used to measure the intelligibility of enhanced speech. The proposed approach is implemented using MFCC and STFT features on a benchmark dataset and we report, on an average, 2 times improvement in ESTOI for enhanced speech using MFCC over STFT features across all noise levels. Further, the proposed approach using MFCC features shows significant improvement in denoising very noisy speech, as opposed to marginal improvement on relatively clean speech.","PeriodicalId":240167,"journal":{"name":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IConSCEPT57958.2023.10170608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper discusses the experimental details of speech enhancement using variational autoencoders (VAE). A joint VAE architecture is formulated, and a training protocol that strikes a balance between speech enhancement and VAE correctness is defined. Extended short-term objective intelligibility (ESTOI) is used to measure the intelligibility of enhanced speech. The proposed approach is implemented using MFCC and STFT features on a benchmark dataset and we report, on an average, 2 times improvement in ESTOI for enhanced speech using MFCC over STFT features across all noise levels. Further, the proposed approach using MFCC features shows significant improvement in denoising very noisy speech, as opposed to marginal improvement on relatively clean speech.
使用变分自编码器的语音增强
本文讨论了用变分自编码器(VAE)进行语音增强的实验细节。制定了一个联合VAE体系结构,并定义了一个在语音增强和VAE正确性之间取得平衡的训练协议。扩展短期客观可解度(ESTOI)用于衡量增强语音的可解度。所提出的方法是在基准数据集上使用MFCC和STFT特征实现的,我们报告说,在所有噪声水平上使用MFCC比STFT特征增强语音的ESTOI平均提高了2倍。此外,使用MFCC特征的方法在去噪非常嘈杂的语音方面表现出显著的改善,而在相对干净的语音上则表现出微弱的改善。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信