Speech Enhancement with Background Noise Suppression in Various Data Corpus Using Bi-LSTM Algorithm

Vinothkumar G, Manoj Kumar D
{"title":"Speech Enhancement with Background Noise Suppression in Various Data Corpus Using Bi-LSTM Algorithm","authors":"Vinothkumar G, Manoj Kumar D","doi":"10.37391/ijeer.120144","DOIUrl":null,"url":null,"abstract":"Noise reduction is one of the crucial procedures in today’s teleconferencing scenarios. The signal-to-noise ratio (SNR) is a paramount factor considered for reducing the Bit error rate (BER). Minimizing the BER will result in the increase of SNR which improves the reliability and performance of the communication system. The microphone is the primary audio input device that captures the input signal, as the input signal is carried away it gets interfered with white noise and phase noise. Thus, the output signal is the combination of the input signal and reverberation noise. Our idea is to minimize the interfering noise thus improving the SNR. To achieve this, we develop a real-time speech-enhancing method that utilizes an enhanced recurrent neural network with Bidirectional Long Short Term Memory (Bi-LSTM). One LSTM in this sequence processing framework accepts the input in the forward direction, whereas the other LSTM takes it in the opposite direction, making up the Bi-LSTM. Considering Bi-LSTM, it takes fewer tensor operations which makes it quicker and more efficient. The Bi-LSTM is trained in real-time using various noise signals. The trained system is utilized to provide an unaltered signal by reducing the noise signal, thus making the proposed system comparable to other noise-suppressing systems. The STOI and PESQ metrics demonstrate a rise of approximately 0.5% to 14.8% and 1.77% to 29.8%, respectively, in contrast to the existing algorithms across various sound types and different input signal-to-noise ratio (SNR) levels.","PeriodicalId":158560,"journal":{"name":"International Journal of Electrical and Electronics Research","volume":"103 18","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical and Electronics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37391/ijeer.120144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Noise reduction is one of the crucial procedures in today’s teleconferencing scenarios. The signal-to-noise ratio (SNR) is a paramount factor considered for reducing the Bit error rate (BER). Minimizing the BER will result in the increase of SNR which improves the reliability and performance of the communication system. The microphone is the primary audio input device that captures the input signal, as the input signal is carried away it gets interfered with white noise and phase noise. Thus, the output signal is the combination of the input signal and reverberation noise. Our idea is to minimize the interfering noise thus improving the SNR. To achieve this, we develop a real-time speech-enhancing method that utilizes an enhanced recurrent neural network with Bidirectional Long Short Term Memory (Bi-LSTM). One LSTM in this sequence processing framework accepts the input in the forward direction, whereas the other LSTM takes it in the opposite direction, making up the Bi-LSTM. Considering Bi-LSTM, it takes fewer tensor operations which makes it quicker and more efficient. The Bi-LSTM is trained in real-time using various noise signals. The trained system is utilized to provide an unaltered signal by reducing the noise signal, thus making the proposed system comparable to other noise-suppressing systems. The STOI and PESQ metrics demonstrate a rise of approximately 0.5% to 14.8% and 1.77% to 29.8%, respectively, in contrast to the existing algorithms across various sound types and different input signal-to-noise ratio (SNR) levels.
使用 Bi-LSTM 算法增强各种数据语料库中的语音并抑制背景噪声
降噪是当今电话会议的关键程序之一。信噪比(SNR)是降低误码率(BER)的首要因素。误码率的最小化将导致信噪比的增加,从而提高通信系统的可靠性和性能。麦克风是捕捉输入信号的主要音频输入设备,输入信号在传输过程中会受到白噪声和相位噪声的干扰。因此,输出信号是输入信号和混响噪声的组合。我们的想法是尽量减少干扰噪声,从而提高信噪比。为实现这一目标,我们开发了一种实时语音增强方法,该方法利用具有双向长短期记忆(Bi-LSTM)的增强型递归神经网络。在这个序列处理框架中,一个 LSTM 接受正向输入,而另一个 LSTM 则接受反向输入,从而构成 Bi-LSTM。考虑到 Bi-LSTM,它所需的张量运算更少,因此速度更快,效率更高。Bi-LSTM 使用各种噪声信号进行实时训练。经过训练的系统可通过减少噪声信号来提供未改变的信号,从而使所提出的系统可与其他噪声抑制系统相媲美。与现有算法相比,在不同声音类型和不同输入信噪比 (SNR) 水平下,STOI 和 PESQ 指标分别提高了约 0.5% 至 14.8%,以及 1.77% 至 29.8%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
1.70
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信