Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning

International Journal of Intelligent Computing and Information Sciences Pub Date : 2021-07-19 DOI:10.21608/IJICIS.2021.73581.1086

Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Zaki Taha

{"title":"Data Augmentation for Arabic Speech Recognition Based on End-to-End Deep Learning","authors":"Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Zaki Taha","doi":"10.21608/IJICIS.2021.73581.1086","DOIUrl":null,"url":null,"abstract":"End-to-end deep learning approach has greatly enhanced the performance of speech recognition systems. With deep learning techniques, the overfitting stills the main problem with a little data. Data augmentation is a suitable solution for the overfitting problem, which is adopted to improve the quantity of training data and enhance robustness of the models. In this paper, we investigate data augmentation method for enhancing Arabic automatic speech recognition (ASR) based on end-to-end deep learning. Data augmentation is applied on original corpus for increasing training data by applying noise adaptation, pitch-shifting, and speed transformation. An CNN-LSTM and attention-based encoder-decoder method are included in building the acoustic model and decoding phase. This method is considered as state-of-art in end-to-end deep learning, and to the best of our knowledge, there is no prior research employed data augmentation for CNN-LSTM and attention-based model in Arabic ASR systems. In addition, the language model is built using RNN-LM and LSTM-LM methods. The Standard Arabic Single Speaker Corpus (SASSC) without diacritics is used as an original corpus. Experimental results show that applying data augmentation improved word error rate (WER) when compared with the same approach without data augmentation. The achieved average reduction in WER is 4.55%.","PeriodicalId":244591,"journal":{"name":"International Journal of Intelligent Computing and Information Sciences","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Computing and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21608/IJICIS.2021.73581.1086","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

End-to-end deep learning approach has greatly enhanced the performance of speech recognition systems. With deep learning techniques, the overfitting stills the main problem with a little data. Data augmentation is a suitable solution for the overfitting problem, which is adopted to improve the quantity of training data and enhance robustness of the models. In this paper, we investigate data augmentation method for enhancing Arabic automatic speech recognition (ASR) based on end-to-end deep learning. Data augmentation is applied on original corpus for increasing training data by applying noise adaptation, pitch-shifting, and speed transformation. An CNN-LSTM and attention-based encoder-decoder method are included in building the acoustic model and decoding phase. This method is considered as state-of-art in end-to-end deep learning, and to the best of our knowledge, there is no prior research employed data augmentation for CNN-LSTM and attention-based model in Arabic ASR systems. In addition, the language model is built using RNN-LM and LSTM-LM methods. The Standard Arabic Single Speaker Corpus (SASSC) without diacritics is used as an original corpus. Experimental results show that applying data augmentation improved word error rate (WER) when compared with the same approach without data augmentation. The achieved average reduction in WER is 4.55%.

查看原文本刊更多论文

基于端到端深度学习的阿拉伯语语音识别数据增强

端到端深度学习方法极大地提高了语音识别系统的性能。对于深度学习技术，过度拟合仍然是少量数据的主要问题。数据增强是解决过拟合问题的一种合适的方法，可以提高训练数据的数量，增强模型的鲁棒性。本文研究了基于端到端深度学习的阿拉伯语自动语音识别(ASR)数据增强方法。对原始语料库进行数据增强，通过噪声自适应、变速、速度变换等方法增加训练数据。声学模型的建立和解码阶段采用了CNN-LSTM和基于注意的编码器-解码器方法。该方法被认为是端到端深度学习领域的最新技术，据我们所知，目前还没有针对CNN-LSTM和基于注意的阿拉伯语ASR系统模型采用数据增强的研究。此外，采用RNN-LM和LSTM-LM方法建立了语言模型。没有变音符号的标准阿拉伯语单语语料库(SASSC)被用作原始语料库。实验结果表明，与不加数据增强的方法相比，采用数据增强的方法可以提高单词错误率。平均降低了4.55%的水当量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Intelligent Computing and Information Sciences

自引率

0.00%

发文量