基于深度学习的独立说话人音素自动古兰经语音识别

IF 3.6 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-07-15 DOI:10.1109/ACCESS.2025.3589252

Samah Al-Zaro;Mahmoud Al-Ayyoub;Osama Al-Khaleel

{"title":"基于深度学习的独立说话人音素自动古兰经语音识别","authors":"Samah Al-Zaro;Mahmoud Al-Ayyoub;Osama Al-Khaleel","doi":"10.1109/ACCESS.2025.3589252","DOIUrl":null,"url":null,"abstract":"An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings and the lack of annotated data for other groups. This work bridges this gap by developing a speaker-independent system that recognizes Quranic recitations of different genders, ages, accents, and Tajweed levels. Our recognizer is designed on the phoneme level to offer Tajweed detection. Using a private dataset, rich of non-transcribed recitations, we propose training the DeepSpeech model with Transfer Learning and semi-supervised learning techniques. The performance of our model is evaluated using several proposed language models and evaluation metrics, including Word Error Rate (WER) and Phoneme Error Rate (PER). The goal is to show how our model would perform in regard to diverse reciter groups. Starting with a typical test set of unseen professional adult male recitations, the WER/PER of our model are 3.11% and 6.18%, respectively. More interestingly, our model achieves a WER of 25.39% and 17.93% when tested on recitations of non-professional (normal) females and children, respectively. The results are very promising and ensure the ability of our model to recognize recitations of various groups of normal reciters. Moreover, the latter results were done on the public “in-the-wild” Tarteel dataset, hoping this will be useful for comparison with future research and building more practical recitation teaching applications. In fact, a major limitation of existing systems (including ours) is the ability to handle diverse in-the-wild scenarios, such as when the reciter is reciting the verses in a very high tempo (common for those trying to memorize the Quran.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"125881-125896"},"PeriodicalIF":3.6000,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11080439","citationCount":"0","resultStr":"{\"title\":\"Speaker-Independent Phoneme-Based Automatic Quranic Speech Recognition Using Deep Learning\",\"authors\":\"Samah Al-Zaro;Mahmoud Al-Ayyoub;Osama Al-Khaleel\",\"doi\":\"10.1109/ACCESS.2025.3589252\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings and the lack of annotated data for other groups. This work bridges this gap by developing a speaker-independent system that recognizes Quranic recitations of different genders, ages, accents, and Tajweed levels. Our recognizer is designed on the phoneme level to offer Tajweed detection. Using a private dataset, rich of non-transcribed recitations, we propose training the DeepSpeech model with Transfer Learning and semi-supervised learning techniques. The performance of our model is evaluated using several proposed language models and evaluation metrics, including Word Error Rate (WER) and Phoneme Error Rate (PER). The goal is to show how our model would perform in regard to diverse reciter groups. Starting with a typical test set of unseen professional adult male recitations, the WER/PER of our model are 3.11% and 6.18%, respectively. More interestingly, our model achieves a WER of 25.39% and 17.93% when tested on recitations of non-professional (normal) females and children, respectively. The results are very promising and ensure the ability of our model to recognize recitations of various groups of normal reciters. Moreover, the latter results were done on the public “in-the-wild” Tarteel dataset, hoping this will be useful for comparison with future research and building more practical recitation teaching applications. In fact, a major limitation of existing systems (including ours) is the ability to handle diverse in-the-wild scenarios, such as when the reciter is reciting the verses in a very high tempo (common for those trying to memorize the Quran.\",\"PeriodicalId\":13079,\"journal\":{\"name\":\"IEEE Access\",\"volume\":\"13 \",\"pages\":\"125881-125896\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-07-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11080439\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Access\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11080439/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11080439/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

自动语音识别系统对于帮助穆斯林准确背诵《古兰经》非常重要。大多数现有的研究都忽略了其系统中广泛的潜在用户（背诵者），主要关注专业成年男性背诵者，因为这一群体的记录丰富，而缺乏其他群体的注释数据。这项工作通过开发一个独立于说话者的系统来弥合这一差距，该系统可以识别不同性别、年龄、口音和塔伊威德水平的古兰经背诵。我们的识别器是在音素水平上设计的，以提供Tajweed检测。使用私有数据集，丰富的非转录背诵，我们提出用迁移学习和半监督学习技术训练DeepSpeech模型。我们的模型的性能使用几种提出的语言模型和评估指标进行评估，包括单词错误率（WER）和音素错误率（PER）。我们的目标是展示我们的模型在不同背诵者群体中的表现。以未见过的专业成年男性背诵的典型测试集为例，我们的模型的WER/PER分别为3.11%和6.18%。更有趣的是，我们的模型在非专业（正常）女性和儿童的背诵测试中分别达到了25.39%和17.93%。结果非常有希望，并确保我们的模型能够识别各种正常背诵者群体的背诵能力。此外，后一项结果是在公开的“野外”Tarteel数据集上完成的，希望这将有助于与未来的研究进行比较，并构建更多实际的背诵教学应用。事实上，现有系统（包括我们的系统）的一个主要限制是处理各种野外场景的能力，例如当背诵者以非常快的速度背诵经文时（对于那些试图背诵古兰经的人来说很常见）。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Speaker-Independent Phoneme-Based Automatic Quranic Speech Recognition Using Deep Learning

An automatic speech recognition system is important to help Muslims recite the Holy Quran accurately. Most existing research ignores a wide range of potential users (reciters) in their systems by focusing on professional adult male reciters due to the abundance of this group’s recordings and the lack of annotated data for other groups. This work bridges this gap by developing a speaker-independent system that recognizes Quranic recitations of different genders, ages, accents, and Tajweed levels. Our recognizer is designed on the phoneme level to offer Tajweed detection. Using a private dataset, rich of non-transcribed recitations, we propose training the DeepSpeech model with Transfer Learning and semi-supervised learning techniques. The performance of our model is evaluated using several proposed language models and evaluation metrics, including Word Error Rate (WER) and Phoneme Error Rate (PER). The goal is to show how our model would perform in regard to diverse reciter groups. Starting with a typical test set of unseen professional adult male recitations, the WER/PER of our model are 3.11% and 6.18%, respectively. More interestingly, our model achieves a WER of 25.39% and 17.93% when tested on recitations of non-professional (normal) females and children, respectively. The results are very promising and ensure the ability of our model to recognize recitations of various groups of normal reciters. Moreover, the latter results were done on the public “in-the-wild” Tarteel dataset, hoping this will be useful for comparison with future research and building more practical recitation teaching applications. In fact, a major limitation of existing systems (including ours) is the ability to handle diverse in-the-wild scenarios, such as when the reciter is reciting the verses in a very high tempo (common for those trying to memorize the Quran.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.