Sureshkumar Natarajan , Syed Abdul Rahman Al-Haddad , Faisul Arif Ahmad , Raja Kamil , Mohd Khair Hassan , Syaril Azrad , June Francis Macleans , Sadiq H. Abdulhussain , Basheera M. Mahmmod , Nurbek Saparkhojayev , Aigul Dauitbayeva
{"title":"语音增强和语音识别的深度神经网络:系统综述","authors":"Sureshkumar Natarajan , Syed Abdul Rahman Al-Haddad , Faisul Arif Ahmad , Raja Kamil , Mohd Khair Hassan , Syaril Azrad , June Francis Macleans , Sadiq H. Abdulhussain , Basheera M. Mahmmod , Nurbek Saparkhojayev , Aigul Dauitbayeva","doi":"10.1016/j.asej.2025.103405","DOIUrl":null,"url":null,"abstract":"<div><div>The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.</div></div>","PeriodicalId":48648,"journal":{"name":"Ain Shams Engineering Journal","volume":"16 7","pages":"Article 103405"},"PeriodicalIF":6.0000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep neural networks for speech enhancement and speech recognition: A systematic review\",\"authors\":\"Sureshkumar Natarajan , Syed Abdul Rahman Al-Haddad , Faisul Arif Ahmad , Raja Kamil , Mohd Khair Hassan , Syaril Azrad , June Francis Macleans , Sadiq H. Abdulhussain , Basheera M. Mahmmod , Nurbek Saparkhojayev , Aigul Dauitbayeva\",\"doi\":\"10.1016/j.asej.2025.103405\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.</div></div>\",\"PeriodicalId\":48648,\"journal\":{\"name\":\"Ain Shams Engineering Journal\",\"volume\":\"16 7\",\"pages\":\"Article 103405\"},\"PeriodicalIF\":6.0000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Ain Shams Engineering Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2090447925001467\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ain Shams Engineering Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2090447925001467","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
Deep neural networks for speech enhancement and speech recognition: A systematic review
The field of speech signal processing has undergone significant transformation through extensive research. There is growing interest in Speech Enhancement (SE) and Automatic Speech Recognition (ASR), with SE serving as a crucial preliminary step to enhance ASR performance. This paper addresses key challenges, particularly the need to maintain speech quality and improve intelligibility in ASR systems. Recently, deep learning techniques have emerged as powerful tools for tackling these challenges. This systematic review examines speech enhancement and recognition techniques, emphasizing denoising, acoustic modeling, and beamforming. Various deep learning architectures, such as Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM) networks, and Hybrid Neural Networks, are reviewed to highlight their roles in enhancement and recognition. The review specifically details their usage, the features utilized in each study, the databases employed, performance, and limitations, all presented in a structured tabular format. This approach provides valuable insights into the strengths and weaknesses of each method, guiding future advancements in the field. In particular, it emphasizes that LSTM-RNN models excel in temporal signal processing, while hybrid models demonstrate superior performance in optimizing task outcomes. The paper conducts a comprehensive statistical analysis of 187 research papers that exclusively utilize deep neural networks to address the challenges of speech enhancement and recognition, presenting the latest advances in the field. The review examines publications from 2012 to 2024, shedding light on research trends and patterns, while the proposed solutions aim to bridge gaps for researchers in this evolving domain.
期刊介绍:
in Shams Engineering Journal is an international journal devoted to publication of peer reviewed original high-quality research papers and review papers in both traditional topics and those of emerging science and technology. Areas of both theoretical and fundamental interest as well as those concerning industrial applications, emerging instrumental techniques and those which have some practical application to an aspect of human endeavor, such as the preservation of the environment, health, waste disposal are welcome. The overall focus is on original and rigorous scientific research results which have generic significance.
Ain Shams Engineering Journal focuses upon aspects of mechanical engineering, electrical engineering, civil engineering, chemical engineering, petroleum engineering, environmental engineering, architectural and urban planning engineering. Papers in which knowledge from other disciplines is integrated with engineering are especially welcome like nanotechnology, material sciences, and computational methods as well as applied basic sciences: engineering mathematics, physics and chemistry.