语音增强和语音识别的深度神经网络：系统综述

IF 6 2区工程技术 Q1 ENGINEERING, MULTIDISCIPLINARY

Ain Shams Engineering Journal Pub Date : 2025-04-30 DOI:10.1016/j.asej.2025.103405

Sureshkumar Natarajan , Syed Abdul Rahman Al-Haddad , Faisul Arif Ahmad , Raja Kamil , Mohd Khair Hassan , Syaril Azrad , June Francis Macleans , Sadiq H. Abdulhussain , Basheera M. Mahmmod , Nurbek Saparkhojayev , Aigul Dauitbayeva

{"title":"语音增强和语音识别的深度神经网络：系统综述","authors":"Sureshkumar Natarajan , Syed Abdul Rahman Al-Haddad , Faisul Arif Ahmad , Raja Kamil , Mohd Khair Hassan , Syaril Azrad , June Francis Macleans , Sadiq H. Abdulhussain , Basheera M. Mahmmod , Nurbek Saparkhojayev , Aigul Dauitbayeva","doi":"10.1016/j.asej.2025.103405","DOIUrl":null,"url":null,"abstract":"<div><div>The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.</div></div>","PeriodicalId":48648,"journal":{"name":"Ain Shams Engineering Journal","volume":"16 7","pages":"Article 103405"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep neural networks for speech enhancement and speech recognition: A systematic review\",\"authors\":\"Sureshkumar Natarajan , Syed Abdul Rahman Al-Haddad , Faisul Arif Ahmad , Raja Kamil , Mohd Khair Hassan , Syaril Azrad , June Francis Macleans , Sadiq H. Abdulhussain , Basheera M. Mahmmod , Nurbek Saparkhojayev , Aigul Dauitbayeva\",\"doi\":\"10.1016/j.asej.2025.103405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.</div></div>\",\"PeriodicalId\":48648,\"journal\":{\"name\":\"Ain Shams Engineering Journal\",\"volume\":\"16 7\",\"pages\":\"Article 103405\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ain Shams Engineering Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2090447925001467\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ain Shams Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2090447925001467","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

经过广泛的研究，语音信号处理领域发生了重大的变化。人们对语音增强（SE）和自动语音识别（ASR）的兴趣日益浓厚，而语音增强是提高自动语音识别性能的关键步骤。本文解决了关键挑战，特别是需要保持语音质量和提高ASR系统的可理解性。最近，深度学习技术已经成为应对这些挑战的强大工具。这篇系统的综述探讨了语音增强和识别技术，强调去噪、声学建模和波束形成。各种深度学习架构，如深度神经网络（DNN）、卷积神经网络（CNN）、循环神经网络（RNN）、长短期记忆（LSTM）网络和混合神经网络，综述了它们在增强和识别中的作用。这篇综述特别详细地介绍了它们的用法、每个研究中使用的特征、所使用的数据库、性能和局限性，所有这些都以结构化的表格格式呈现。这种方法对每种方法的优缺点提供了有价值的见解，指导了该领域未来的发展。它特别强调LSTM-RNN模型在时间信号处理方面表现优异，而混合模型在优化任务结果方面表现优异。本文对专门利用深度神经网络解决语音增强和识别挑战的187篇研究论文进行了全面的统计分析，介绍了该领域的最新进展。该综述审查了2012年至2024年的出版物，揭示了研究趋势和模式，而提出的解决方案旨在弥合研究人员在这一不断发展的领域的差距。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep neural networks for speech enhancement and speech recognition: A systematic review

The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Ain Shams Engineering Journal Engineering-General Engineering

CiteScore

10.80

自引率

13.30%

发文量

441

审稿时长

49 weeks

期刊介绍： in Shams Engineering Journal is an international journal devoted to publication of peer reviewed original high-quality research papers and review papers in both traditional topics and those of emerging science and technology. Areas of both theoretical and fundamental interest as well as those concerning industrial applications, emerging instrumental techniques and those which have some practical application to an aspect of human endeavor, such as the preservation of the environment, health, waste disposal are welcome. The overall focus is on original and rigorous scientific research results which have generic significance. Ain Shams Engineering Journal focuses upon aspects of mechanical engineering, electrical engineering, civil engineering, chemical engineering, petroleum engineering, environmental engineering, architectural and urban planning engineering. Papers in which knowledge from other disciplines is integrated with engineering are especially welcome like nanotechnology, material sciences, and computational methods as well as applied basic sciences: engineering mathematics, physics and chemistry.