{"title":"Speaker-Independent Phoneme-Based Automatic Quranic Speech Recognition Using Deep Learning","authors":"Samah Al-Zaro;Mahmoud Al-Ayyoub;Osama Al-Khaleel","doi":"10.1109/ACCESS.2025.3589252","DOIUrl":null,"url":null,"abstract":"An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings and the lack of annotated data for other groups. This work bridges this gap by developing a speaker-independent system that recognizes Quranic recitations of different genders, ages, accents, and Tajweed levels. Our recognizer is designed on the phoneme level to offer Tajweed detection. Using a private dataset, rich of non-transcribed recitations, we propose training the DeepSpeech model with Transfer Learning and semi-supervised learning techniques. The performance of our model is evaluated using several proposed language models and evaluation metrics, including Word Error Rate (WER) and Phoneme Error Rate (PER). The goal is to show how our model would perform in regard to diverse reciter groups. Starting with a typical test set of unseen professional adult male recitations, the WER/PER of our model are 3.11% and 6.18%, respectively. More interestingly, our model achieves a WER of 25.39% and 17.93% when tested on recitations of non-professional (normal) females and children, respectively. The results are very promising and ensure the ability of our model to recognize recitations of various groups of normal reciters. Moreover, the latter results were done on the public “in-the-wild” Tarteel dataset, hoping this will be useful for comparison with future research and building more practical recitation teaching applications. In fact, a major limitation of existing systems (including ours) is the ability to handle diverse in-the-wild scenarios, such as when the reciter is reciting the verses in a very high tempo (common for those trying to memorize the Quran.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"125881-125896"},"PeriodicalIF":3.4000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11080439","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11080439/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings and the lack of annotated data for other groups. This work bridges this gap by developing a speaker-independent system that recognizes Quranic recitations of different genders, ages, accents, and Tajweed levels. Our recognizer is designed on the phoneme level to offer Tajweed detection. Using a private dataset, rich of non-transcribed recitations, we propose training the DeepSpeech model with Transfer Learning and semi-supervised learning techniques. The performance of our model is evaluated using several proposed language models and evaluation metrics, including Word Error Rate (WER) and Phoneme Error Rate (PER). The goal is to show how our model would perform in regard to diverse reciter groups. Starting with a typical test set of unseen professional adult male recitations, the WER/PER of our model are 3.11% and 6.18%, respectively. More interestingly, our model achieves a WER of 25.39% and 17.93% when tested on recitations of non-professional (normal) females and children, respectively. The results are very promising and ensure the ability of our model to recognize recitations of various groups of normal reciters. Moreover, the latter results were done on the public “in-the-wild” Tarteel dataset, hoping this will be useful for comparison with future research and building more practical recitation teaching applications. In fact, a major limitation of existing systems (including ours) is the ability to handle diverse in-the-wild scenarios, such as when the reciter is reciting the verses in a very high tempo (common for those trying to memorize the Quran.
IEEE AccessCOMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC
CiteScore
9.80
自引率
7.70%
发文量
6673
审稿时长
6 weeks
期刊介绍:
IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest.
IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on:
Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals.
Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering.
Development of new or improved fabrication or manufacturing techniques.
Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.