Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review

IF 2.2 4区计算机科学 Q3 TELECOMMUNICATIONS

Wireless Personal Communications Pub Date : 2024-08-06 DOI:10.1007/s11277-024-11448-x

Mahadevaswamy Shanthamallappa

{"title":"Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review","authors":"Mahadevaswamy Shanthamallappa","doi":"10.1007/s11277-024-11448-x","DOIUrl":null,"url":null,"abstract":"<p>Developing a robust Automatic Speech Recognition (ASR) system is a major challenge in speech signal processing research. These systems perform exceedingly well in clean environments. However, the performance of these systems is not acceptable when the spoken signal is corrupted by several environmental and other artificial noises. The efficiency of any ASR system depends on several factors such as size of the vocabulary, native language influences, transmission channel, emotional and health state of the speaker, age of the speaker, designed speech corpus, size of the dataset, training and testing strategy and its preprocessing and other challenges. It is well known fact that the presence of noise in speech signal degrades its perceptual quality and intelligibility and hence ASR system performance is also affected. So, in this paper Dabauchies Wavelet based time adaptive Bayes thresholding algorithm is proposed with a custom Wavelet Packet Decomposition and Reconstruction Tree. The proposed system performance is evaluated on the Private Kannada Dataset and TIMIT dataset. The results reveal the effectiveness of the proposed system in various SNR levels such as − 10, − 5, 0, 5, 10, 15, 20, 25 and 30 dB. The article begins with introductory insights on ASR, Physiological process of speech production and perception in Humans, ASR jorgans, the architecture of ASR, and barriers associated with the ASR design. The work also focus on dataset design, baseline speech enhancement methods. This work provides comprehensive review to Wavelet based speech enhancement approach to the research scholars pursuing research in the area of speech signal processing. </p>","PeriodicalId":23827,"journal":{"name":"Wireless Personal Communications","volume":"23 1","pages":""},"PeriodicalIF":2.2000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wireless Personal Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11277-024-11448-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Developing a robust Automatic Speech Recognition (ASR) system is a major challenge in speech signal processing research. These systems perform exceedingly well in clean environments. However, the performance of these systems is not acceptable when the spoken signal is corrupted by several environmental and other artificial noises. The efficiency of any ASR system depends on several factors such as size of the vocabulary, native language influences, transmission channel, emotional and health state of the speaker, age of the speaker, designed speech corpus, size of the dataset, training and testing strategy and its preprocessing and other challenges. It is well known fact that the presence of noise in speech signal degrades its perceptual quality and intelligibility and hence ASR system performance is also affected. So, in this paper Dabauchies Wavelet based time adaptive Bayes thresholding algorithm is proposed with a custom Wavelet Packet Decomposition and Reconstruction Tree. The proposed system performance is evaluated on the Private Kannada Dataset and TIMIT dataset. The results reveal the effectiveness of the proposed system in various SNR levels such as − 10, − 5, 0, 5, 10, 15, 20, 25 and 30 dB. The article begins with introductory insights on ASR, Physiological process of speech production and perception in Humans, ASR jorgans, the architecture of ASR, and barriers associated with the ASR design. The work also focus on dataset design, baseline speech enhancement methods. This work provides comprehensive review to Wavelet based speech enhancement approach to the research scholars pursuing research in the area of speech signal processing.

Abstract Image

查看原文本刊更多论文

使用基于 Dabauchies 小波的自适应小波阈值进行鲁棒语音增强，以开发鲁棒自动语音识别：全面综述

开发稳健的自动语音识别（ASR）系统是语音信号处理研究的一大挑战。这些系统在干净的环境中表现非常出色。然而，当口语信号受到多种环境噪音和其他人工噪音的干扰时，这些系统的性能就无法令人接受了。任何 ASR 系统的效率都取决于多个因素，如词汇量的大小、母语影响、传输渠道、说话者的情绪和健康状况、说话者的年龄、设计的语音语料库、数据集的大小、训练和测试策略及其预处理和其他挑战。众所周知，语音信号中的噪声会降低其感知质量和可懂度，从而影响 ASR 系统的性能。因此，本文提出了基于 Dabauchies 小波的时间自适应贝叶斯阈值算法，并定制了小波包分解和重建树。在私人卡纳达数据集和 TIMIT 数据集上对所提出的系统性能进行了评估。结果表明，拟议系统在不同信噪比水平（如 - 10、- 5、0、5、10、15、20、25 和 30 dB）下都非常有效。文章首先介绍了人工智能语音识别（ASR）、人类语音产生和感知的生理过程、人工智能语音识别（ASR）工具、人工智能语音识别（ASR）架构以及与人工智能语音识别（ASR）设计相关的障碍。文章还重点介绍了数据集设计、基线语音增强方法。本作品向从事语音信号处理领域研究的学者全面介绍了基于小波的语音增强方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Wireless Personal Communications 工程技术-电信学

CiteScore

5.80

自引率

9.10%

发文量

663

审稿时长

6.8 months

期刊介绍： The Journal on Mobile Communication and Computing ... Publishes tutorial, survey, and original research papers addressing mobile communications and computing; Investigates theoretical, engineering, and experimental aspects of radio communications, voice, data, images, and multimedia; Explores propagation, system models, speech and image coding, multiple access techniques, protocols, performance evaluation, radio local area networks, and networking and architectures, etc.; 98% of authors who answered a survey reported that they would definitely publish or probably publish in the journal again. Wireless Personal Communications is an archival, peer reviewed, scientific and technical journal addressing mobile communications and computing. It investigates theoretical, engineering, and experimental aspects of radio communications, voice, data, images, and multimedia. A partial list of topics included in the journal is: propagation, system models, speech and image coding, multiple access techniques, protocols performance evaluation, radio local area networks, and networking and architectures. In addition to the above mentioned areas, the journal also accepts papers that deal with interdisciplinary aspects of wireless communications along with: big data and analytics, business and economy, society, and the environment. The journal features five principal types of papers: full technical papers, short papers, technical aspects of policy and standardization, letters offering new research thoughts and experimental ideas, and invited papers on important and emerging topics authored by renowned experts.