Speech Enhancement with Background Noise Suppression in Various Data Corpus Using Bi-LSTM Algorithm

International Journal of Electrical and Electronics Research Pub Date : 2024-03-28 DOI:10.37391/ijeer.120144

Vinothkumar G, Manoj Kumar D

{"title":"Speech Enhancement with Background Noise Suppression in Various Data Corpus Using Bi-LSTM Algorithm","authors":"Vinothkumar G, Manoj Kumar D","doi":"10.37391/ijeer.120144","DOIUrl":null,"url":null,"abstract":"Noise reduction is one of the crucial procedures in today’s teleconferencing scenarios. The signal-to-noise ratio (SNR) is a paramount factor considered for reducing the Bit error rate (BER). Minimizing the BER will result in the increase of SNR which improves the reliability and performance of the communication system. The microphone is the primary audio input device that captures the input signal, as the input signal is carried away it gets interfered with white noise and phase noise. Thus, the output signal is the combination of the input signal and reverberation noise. Our idea is to minimize the interfering noise thus improving the SNR. To achieve this, we develop a real-time speech-enhancing method that utilizes an enhanced recurrent neural network with Bidirectional Long Short Term Memory (Bi-LSTM). One LSTM in this sequence processing framework accepts the input in the forward direction, whereas the other LSTM takes it in the opposite direction, making up the Bi-LSTM. Considering Bi-LSTM, it takes fewer tensor operations which makes it quicker and more efficient. The Bi-LSTM is trained in real-time using various noise signals. The trained system is utilized to provide an unaltered signal by reducing the noise signal, thus making the proposed system comparable to other noise-suppressing systems. The STOI and PESQ metrics demonstrate a rise of approximately 0.5% to 14.8% and 1.77% to 29.8%, respectively, in contrast to the existing algorithms across various sound types and different input signal-to-noise ratio (SNR) levels.","PeriodicalId":158560,"journal":{"name":"International Journal of Electrical and Electronics Research","volume":"103 18","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Electrical and Electronics Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.37391/ijeer.120144","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Noise reduction is one of the crucial procedures in today’s teleconferencing scenarios. The signal-to-noise ratio (SNR) is a paramount factor considered for reducing the Bit error rate (BER). Minimizing the BER will result in the increase of SNR which improves the reliability and performance of the communication system. The microphone is the primary audio input device that captures the input signal, as the input signal is carried away it gets interfered with white noise and phase noise. Thus, the output signal is the combination of the input signal and reverberation noise. Our idea is to minimize the interfering noise thus improving the SNR. To achieve this, we develop a real-time speech-enhancing method that utilizes an enhanced recurrent neural network with Bidirectional Long Short Term Memory (Bi-LSTM). One LSTM in this sequence processing framework accepts the input in the forward direction, whereas the other LSTM takes it in the opposite direction, making up the Bi-LSTM. Considering Bi-LSTM, it takes fewer tensor operations which makes it quicker and more efficient. The Bi-LSTM is trained in real-time using various noise signals. The trained system is utilized to provide an unaltered signal by reducing the noise signal, thus making the proposed system comparable to other noise-suppressing systems. The STOI and PESQ metrics demonstrate a rise of approximately 0.5% to 14.8% and 1.77% to 29.8%, respectively, in contrast to the existing algorithms across various sound types and different input signal-to-noise ratio (SNR) levels.

查看原文本刊更多论文

使用 Bi-LSTM 算法增强各种数据语料库中的语音并抑制背景噪声

降噪是当今电话会议的关键程序之一。信噪比（SNR）是降低误码率（BER）的首要因素。误码率的最小化将导致信噪比的增加，从而提高通信系统的可靠性和性能。麦克风是捕捉输入信号的主要音频输入设备，输入信号在传输过程中会受到白噪声和相位噪声的干扰。因此，输出信号是输入信号和混响噪声的组合。我们的想法是尽量减少干扰噪声，从而提高信噪比。为实现这一目标，我们开发了一种实时语音增强方法，该方法利用具有双向长短期记忆（Bi-LSTM）的增强型递归神经网络。在这个序列处理框架中，一个 LSTM 接受正向输入，而另一个 LSTM 则接受反向输入，从而构成 Bi-LSTM。考虑到 Bi-LSTM，它所需的张量运算更少，因此速度更快，效率更高。Bi-LSTM 使用各种噪声信号进行实时训练。经过训练的系统可通过减少噪声信号来提供未改变的信号，从而使所提出的系统可与其他噪声抑制系统相媲美。与现有算法相比，在不同声音类型和不同输入信噪比 (SNR) 水平下，STOI 和 PESQ 指标分别提高了约 0.5% 至 14.8%，以及 1.77% 至 29.8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Electrical and Electronics Research

CiteScore

1.70

自引率

0.00%

发文量