使用端到端深度学习的阿拉伯语语音识别

Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Z. Fayed
{"title":"使用端到端深度学习的阿拉伯语语音识别","authors":"Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Z. Fayed","doi":"10.1049/SIL2.12057","DOIUrl":null,"url":null,"abstract":"Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state ‐ of ‐ the ‐ art end ‐ to ‐ end deep learning approaches is inves-tigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel ‐ Frequency Cepstral Coefficients and the log Mel ‐ Scale Filter Bank energies as acoustic features. To the best of our knowledge, end ‐ to ‐ end deep learning approach has not been used in the task of diacritised Arabic automatic speech recognition. To fill this gap, this work presents a new CTC ‐ based ASR, CNN ‐ LSTM, and an attention ‐ based end ‐ to ‐ end approach for improving diacritisedArabic ASR. In addition, a word ‐ based language model is employed to achieve better results. The end ‐ to ‐ end approaches applied in this work are based on state ‐ of ‐ the ‐ art frameworks, namely ESPnet and Espresso. Training and testing of these frameworks are performed based on the Standard Arabic Single Speaker Corpus (SASSC), which contains 7 h of modern standard Arabic speech. Experimental results show that the CNN ‐ LSTM","PeriodicalId":272888,"journal":{"name":"IET Signal Process.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"Arabic speech recognition using end-to-end deep learning\",\"authors\":\"Hamzah A. Alsayadi, A. Abdelhamid, I. Hegazy, Z. Fayed\",\"doi\":\"10.1049/SIL2.12057\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state ‐ of ‐ the ‐ art end ‐ to ‐ end deep learning approaches is inves-tigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel ‐ Frequency Cepstral Coefficients and the log Mel ‐ Scale Filter Bank energies as acoustic features. To the best of our knowledge, end ‐ to ‐ end deep learning approach has not been used in the task of diacritised Arabic automatic speech recognition. To fill this gap, this work presents a new CTC ‐ based ASR, CNN ‐ LSTM, and an attention ‐ based end ‐ to ‐ end approach for improving diacritisedArabic ASR. In addition, a word ‐ based language model is employed to achieve better results. The end ‐ to ‐ end approaches applied in this work are based on state ‐ of ‐ the ‐ art frameworks, namely ESPnet and Espresso. Training and testing of these frameworks are performed based on the Standard Arabic Single Speaker Corpus (SASSC), which contains 7 h of modern standard Arabic speech. Experimental results show that the CNN ‐ LSTM\",\"PeriodicalId\":272888,\"journal\":{\"name\":\"IET Signal Process.\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IET Signal Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/SIL2.12057\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Signal Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/SIL2.12057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

摘要

带变音符的阿拉伯语自动语音识别(ASR)方法比不带变音符的阿拉伯语自动语音识别方法具有更好的与其他系统集成的能力。在这项工作中,研究了最先进的端到端深度学习方法的应用,以建立一个鲁棒的变音符阿拉伯语ASR。这些方法是基于Mel - Frequency倒谱系数和对数Mel - Scale滤波器组能量作为声学特征。据我们所知,端到端深度学习方法尚未用于变音符阿拉伯语自动语音识别任务。为了填补这一空白,本研究提出了一种新的基于CTC的ASR, CNN - LSTM,以及一种基于注意力的端到端方法,用于改进变音符基础ASR。此外,为了达到更好的效果,采用了基于词的语言模型。在这项工作中应用的端到端方法是基于最先进的框架,即ESPnet和Espresso。这些框架的训练和测试是基于标准阿拉伯语单语语料库(SASSC)进行的,该语料库包含7小时的现代标准阿拉伯语语音。实验结果表明,CNN - LSTM
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Arabic speech recognition using end-to-end deep learning
Arabic automatic speech recognition (ASR) methods with diacritics have the ability to be integrated with other systems better than Arabic ASR methods without diacritics. In this work, the application of state ‐ of ‐ the ‐ art end ‐ to ‐ end deep learning approaches is inves-tigated to build a robust diacritised Arabic ASR. These approaches are based on the Mel ‐ Frequency Cepstral Coefficients and the log Mel ‐ Scale Filter Bank energies as acoustic features. To the best of our knowledge, end ‐ to ‐ end deep learning approach has not been used in the task of diacritised Arabic automatic speech recognition. To fill this gap, this work presents a new CTC ‐ based ASR, CNN ‐ LSTM, and an attention ‐ based end ‐ to ‐ end approach for improving diacritisedArabic ASR. In addition, a word ‐ based language model is employed to achieve better results. The end ‐ to ‐ end approaches applied in this work are based on state ‐ of ‐ the ‐ art frameworks, namely ESPnet and Espresso. Training and testing of these frameworks are performed based on the Standard Arabic Single Speaker Corpus (SASSC), which contains 7 h of modern standard Arabic speech. Experimental results show that the CNN ‐ LSTM
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信