Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review
{"title":"Robust Speech Enhancement Using Dabauchies Wavelet Based Adaptive Wavelet Thresholding for the Development of Robust Automatic Speech Recognition: A Comprehensive Review","authors":"Mahadevaswamy Shanthamallappa","doi":"10.1007/s11277-024-11448-x","DOIUrl":null,"url":null,"abstract":"<p>Developing a robust Automatic Speech Recognition (ASR) system is a major challenge in speech signal processing research. These systems perform exceedingly well in clean environments. However, the performance of these systems is not acceptable when the spoken signal is corrupted by several environmental and other artificial noises. The efficiency of any ASR system depends on several factors such as size of the vocabulary, native language influences, transmission channel, emotional and health state of the speaker, age of the speaker, designed speech corpus, size of the dataset, training and testing strategy and its preprocessing and other challenges. It is well known fact that the presence of noise in speech signal degrades its perceptual quality and intelligibility and hence ASR system performance is also affected. So, in this paper Dabauchies Wavelet based time adaptive Bayes thresholding algorithm is proposed with a custom Wavelet Packet Decomposition and Reconstruction Tree. The proposed system performance is evaluated on the Private Kannada Dataset and TIMIT dataset. The results reveal the effectiveness of the proposed system in various SNR levels such as − 10, − 5, 0, 5, 10, 15, 20, 25 and 30 dB. The article begins with introductory insights on ASR, Physiological process of speech production and perception in Humans, ASR jorgans, the architecture of ASR, and barriers associated with the ASR design. The work also focus on dataset design, baseline speech enhancement methods. This work provides comprehensive review to Wavelet based speech enhancement approach to the research scholars pursuing research in the area of speech signal processing. </p>","PeriodicalId":23827,"journal":{"name":"Wireless Personal Communications","volume":"23 1","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wireless Personal Communications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11277-024-11448-x","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Developing a robust Automatic Speech Recognition (ASR) system is a major challenge in speech signal processing research. These systems perform exceedingly well in clean environments. However, the performance of these systems is not acceptable when the spoken signal is corrupted by several environmental and other artificial noises. The efficiency of any ASR system depends on several factors such as size of the vocabulary, native language influences, transmission channel, emotional and health state of the speaker, age of the speaker, designed speech corpus, size of the dataset, training and testing strategy and its preprocessing and other challenges. It is well known fact that the presence of noise in speech signal degrades its perceptual quality and intelligibility and hence ASR system performance is also affected. So, in this paper Dabauchies Wavelet based time adaptive Bayes thresholding algorithm is proposed with a custom Wavelet Packet Decomposition and Reconstruction Tree. The proposed system performance is evaluated on the Private Kannada Dataset and TIMIT dataset. The results reveal the effectiveness of the proposed system in various SNR levels such as − 10, − 5, 0, 5, 10, 15, 20, 25 and 30 dB. The article begins with introductory insights on ASR, Physiological process of speech production and perception in Humans, ASR jorgans, the architecture of ASR, and barriers associated with the ASR design. The work also focus on dataset design, baseline speech enhancement methods. This work provides comprehensive review to Wavelet based speech enhancement approach to the research scholars pursuing research in the area of speech signal processing.
期刊介绍:
The Journal on Mobile Communication and Computing ...
Publishes tutorial, survey, and original research papers addressing mobile communications and computing;
Investigates theoretical, engineering, and experimental aspects of radio communications, voice, data, images, and multimedia;
Explores propagation, system models, speech and image coding, multiple access techniques, protocols, performance evaluation, radio local area networks, and networking and architectures, etc.;
98% of authors who answered a survey reported that they would definitely publish or probably publish in the journal again.
Wireless Personal Communications is an archival, peer reviewed, scientific and technical journal addressing mobile communications and computing. It investigates theoretical, engineering, and experimental aspects of radio communications, voice, data, images, and multimedia. A partial list of topics included in the journal is: propagation, system models, speech and image coding, multiple access techniques, protocols performance evaluation, radio local area networks, and networking and architectures.
In addition to the above mentioned areas, the journal also accepts papers that deal with interdisciplinary aspects of wireless communications along with: big data and analytics, business and economy, society, and the environment.
The journal features five principal types of papers: full technical papers, short papers, technical aspects of policy and standardization, letters offering new research thoughts and experimental ideas, and invited papers on important and emerging topics authored by renowned experts.