Towards Building a Speech Recognition System for Quranic Recitations: A Pilot Study Involving Female Reciters.

IF 0.7 Q4 COMPUTER SCIENCE, INFORMATION SYSTEMS
Suha A. Issa, Mahmoud Ayyoub, O. Khaleel, N. Elmitwally
{"title":"Towards Building a Speech Recognition System for Quranic Recitations: A Pilot Study Involving Female Reciters.","authors":"Suha A. Issa, Mahmoud Ayyoub, O. Khaleel, N. Elmitwally","doi":"10.5455/jjee.204-1612774767","DOIUrl":null,"url":null,"abstract":"This paper is the first step in an effort toward building automatic speech recognition (ASR) system for Quranic recitations that caters specifically to female reciters. To function properly, ASR systems require a huge amount of data for training. Surprisingly, the data readily available for Quranic recitations suffer from major limitations. Specifically, the currently available audio recordings of Quran recitations have massive volume, but they are mostly done by male reciters (who have dedicated most of their lives to perfecting their recitation skills) using professional and expensive equipment. Such proficiency in the training data (along with the fact that the reciters come from a specific demographic group; adult males) will most likely lead to some bias in the resulting model and limit their ability to process input from other groups, such as non-/semi-professionals, females or children. This work aims at empirically exploring this shortcoming. To do so, we create a first-of-its-kind (to the best of our knowledge) benchmark dataset called the Quran recitations by females and males (QRFAM) dataset. QRFAM is a relatively big dataset of audio recordings made by male and female reciters from different age groups and proficiency levels. After creating the dataset, we experiment on it by building ASR systems based on one of the most popular open-source ASR models, which is the celebrated DeepSpeech model from Mozilla. The speaker-independent end-to-end models, that we produce, are evaluated using word error rate (WER). Despite DeepSpeech’s known flexibility and prowess (which is shown when trained and tested on recitations from the same group), the models trained on the recitations of one group could not recognize most of the recitations done by the other groups in the testing phase. This shows that there is still a long way to go in order to produce an ASR system that can be used by anyone and the first step is to build and expand the resources needed for this such as QRFAM. Hopefully, our work will be the first step in this direction and it will inspire the community to take more interest in this problem.","PeriodicalId":29729,"journal":{"name":"Jordan Journal of Electrical Engineering","volume":"1 1","pages":""},"PeriodicalIF":0.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jordan Journal of Electrical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5455/jjee.204-1612774767","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1

Abstract

This paper is the first step in an effort toward building automatic speech recognition (ASR) system for Quranic recitations that caters specifically to female reciters. To function properly, ASR systems require a huge amount of data for training. Surprisingly, the data readily available for Quranic recitations suffer from major limitations. Specifically, the currently available audio recordings of Quran recitations have massive volume, but they are mostly done by male reciters (who have dedicated most of their lives to perfecting their recitation skills) using professional and expensive equipment. Such proficiency in the training data (along with the fact that the reciters come from a specific demographic group; adult males) will most likely lead to some bias in the resulting model and limit their ability to process input from other groups, such as non-/semi-professionals, females or children. This work aims at empirically exploring this shortcoming. To do so, we create a first-of-its-kind (to the best of our knowledge) benchmark dataset called the Quran recitations by females and males (QRFAM) dataset. QRFAM is a relatively big dataset of audio recordings made by male and female reciters from different age groups and proficiency levels. After creating the dataset, we experiment on it by building ASR systems based on one of the most popular open-source ASR models, which is the celebrated DeepSpeech model from Mozilla. The speaker-independent end-to-end models, that we produce, are evaluated using word error rate (WER). Despite DeepSpeech’s known flexibility and prowess (which is shown when trained and tested on recitations from the same group), the models trained on the recitations of one group could not recognize most of the recitations done by the other groups in the testing phase. This shows that there is still a long way to go in order to produce an ASR system that can be used by anyone and the first step is to build and expand the resources needed for this such as QRFAM. Hopefully, our work will be the first step in this direction and it will inspire the community to take more interest in this problem.
《古兰经》背诵语音识别系统的构建——以女性背诵者为对象的初步研究。
这篇论文是专门为女性古兰经背诵者建立自动语音识别(ASR)系统的第一步。为了正常工作,自动识别系统需要大量的训练数据。令人惊讶的是,可供背诵《古兰经》的资料有很大的局限性。具体来说,目前可获得的《古兰经》背诵录音量很大,但它们大多是由男性背诵者(他们一生中大部分时间都致力于完善背诵技能)使用专业和昂贵的设备完成的。对训练数据的熟练程度(以及背诵者来自特定人口群体的事实;成年男性)很可能会导致最终模型出现一些偏差,并限制他们处理来自其他群体(如非/半专业人士、女性或儿童)输入的能力。本文旨在实证地探讨这一缺陷。为此,我们创建了首个(据我们所知)基准数据集,称为男女诵读古兰经(QRFAM)数据集。QRFAM是一个相对较大的录音数据集,由不同年龄组和熟练程度的男女背诵者制作。在创建数据集之后,我们通过基于最流行的开源ASR模型之一构建ASR系统来进行实验,该模型是Mozilla著名的DeepSpeech模型。我们生成的与说话人无关的端到端模型使用单词错误率(WER)进行评估。尽管DeepSpeech具有众所周知的灵活性和实力(这在对同一组的背诵进行训练和测试时显示出来),但在测试阶段,对一组背诵进行训练的模型无法识别其他组所做的大部分背诵。这表明,要生产一个任何人都可以使用的ASR系统还有很长的路要走,第一步是建立和扩展所需的资源,如QRFAM。希望我们的工作将是朝着这个方向迈出的第一步,它将激发社区对这个问题的更多兴趣。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
0.20
自引率
14.30%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信