用于电喉装置的阿拉伯语自动编码器语音识别系统的深度学习方法

Adv. Hum. Comput. Interact. Pub Date : 2023-02-28 DOI:10.1155/2023/7398538

Z. J. M. Ameen, A. Kadhim

{"title":"用于电喉装置的阿拉伯语自动编码器语音识别系统的深度学习方法","authors":"Z. J. M. Ameen, A. Kadhim","doi":"10.1155/2023/7398538","DOIUrl":null,"url":null,"abstract":"Recent advances in speech recognition have achieved remarkable performance comparable with human transcribers’ abilities. But this significant performance is not the same for all the spoken languages. The Arabic language is one of them. Arabic speech recognition is bounded to the lack of suitable datasets. Artificial intelligence algorithms have shown promising capabilities for Arabic speech recognition. Arabic is the official language of 22 countries, and it has been estimated that 400 million people speak the Arabic language worldwide. Speech disabilities have been one of the expanding problems in the last decades, even in kids. Some devices can be used to generate speech for those people. One of these devices is the Servox Digital Electro-Larynx (EL). In this research, we developed an autoencoder with a combination of long short-term memory (LSTM) and gated recurrent units (GRU) models to recognize recorded signals from Servox Digital EL Electro-Larynx. The proposed framework consisted of three steps: denoising, feature extraction, and Arabic speech recognition. The experimental results show 95.31% accuracy for Arabic speech recognition with the proposed model. In this research, we evaluated different combinations of LSTM and GRU for constructing the best autoencoder. A rigorous evaluation process indicates better performance with the use of GRU in both encoder and decoder structures. The proposed model achieved a 4.69% word error rate (WER). Experimental results confirm that the proposed model can be used for developing a real-time app to recognize common Arabic spoken words.","PeriodicalId":192934,"journal":{"name":"Adv. Hum. Comput. Interact.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deep Learning Methods for Arabic Autoencoder Speech Recognition System for Electro-Larynx Device\",\"authors\":\"Z. J. M. Ameen, A. Kadhim\",\"doi\":\"10.1155/2023/7398538\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent advances in speech recognition have achieved remarkable performance comparable with human transcribers’ abilities. But this significant performance is not the same for all the spoken languages. The Arabic language is one of them. Arabic speech recognition is bounded to the lack of suitable datasets. Artificial intelligence algorithms have shown promising capabilities for Arabic speech recognition. Arabic is the official language of 22 countries, and it has been estimated that 400 million people speak the Arabic language worldwide. Speech disabilities have been one of the expanding problems in the last decades, even in kids. Some devices can be used to generate speech for those people. One of these devices is the Servox Digital Electro-Larynx (EL). In this research, we developed an autoencoder with a combination of long short-term memory (LSTM) and gated recurrent units (GRU) models to recognize recorded signals from Servox Digital EL Electro-Larynx. The proposed framework consisted of three steps: denoising, feature extraction, and Arabic speech recognition. The experimental results show 95.31% accuracy for Arabic speech recognition with the proposed model. In this research, we evaluated different combinations of LSTM and GRU for constructing the best autoencoder. A rigorous evaluation process indicates better performance with the use of GRU in both encoder and decoder structures. The proposed model achieved a 4.69% word error rate (WER). Experimental results confirm that the proposed model can be used for developing a real-time app to recognize common Arabic spoken words.\",\"PeriodicalId\":192934,\"journal\":{\"name\":\"Adv. Hum. Comput. Interact.\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Adv. Hum. Comput. Interact.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1155/2023/7398538\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adv. Hum. Comput. Interact.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2023/7398538","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

语音识别的最新进展已经取得了与人类转录能力相当的显著表现。但这种显著的表现并不适用于所有的口语。阿拉伯语就是其中之一。阿拉伯语语音识别受限于缺乏合适的数据集。人工智能算法在阿拉伯语语音识别方面表现出了很好的能力。阿拉伯语是22个国家的官方语言，据估计全世界有4亿人说阿拉伯语。在过去的几十年里，语言障碍一直是一个不断扩大的问题，甚至在儿童中也是如此。一些设备可以用来为这些人生成语音。其中一种装置是伺服数字电喉(EL)。在这项研究中，我们开发了一种结合长短期记忆(LSTM)和门控循环单元(GRU)模型的自编码器，以识别来自Servox Digital EL电喉的记录信号。该框架包括三个步骤:去噪、特征提取和阿拉伯语语音识别。实验结果表明，该模型对阿拉伯语语音识别的准确率为95.31%。在这项研究中，我们评估了LSTM和GRU的不同组合来构建最佳的自编码器。严格的评估过程表明，在编码器和解码器结构中使用GRU具有更好的性能。该模型的错误率为4.69%。实验结果表明，该模型可用于开发实时识别阿拉伯语常用口语单词的应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Learning Methods for Arabic Autoencoder Speech Recognition System for Electro-Larynx Device

Recent advances in speech recognition have achieved remarkable performance comparable with human transcribers’ abilities. But this significant performance is not the same for all the spoken languages. The Arabic language is one of them. Arabic speech recognition is bounded to the lack of suitable datasets. Artificial intelligence algorithms have shown promising capabilities for Arabic speech recognition. Arabic is the official language of 22 countries, and it has been estimated that 400 million people speak the Arabic language worldwide. Speech disabilities have been one of the expanding problems in the last decades, even in kids. Some devices can be used to generate speech for those people. One of these devices is the Servox Digital Electro-Larynx (EL). In this research, we developed an autoencoder with a combination of long short-term memory (LSTM) and gated recurrent units (GRU) models to recognize recorded signals from Servox Digital EL Electro-Larynx. The proposed framework consisted of three steps: denoising, feature extraction, and Arabic speech recognition. The experimental results show 95.31% accuracy for Arabic speech recognition with the proposed model. In this research, we evaluated different combinations of LSTM and GRU for constructing the best autoencoder. A rigorous evaluation process indicates better performance with the use of GRU in both encoder and decoder structures. The proposed model achieved a 4.69% word error rate (WER). Experimental results confirm that the proposed model can be used for developing a real-time app to recognize common Arabic spoken words.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Adv. Hum. Comput. Interact.

自引率

0.00%

发文量