The effect of diacritization on Arabic speech recogntion

2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT) Pub Date : 2017-10-01 DOI:10.1109/AEECT.2017.8257758

Fawaz S. Al-Anzi, Dia AbuZeina

{"title":"The effect of diacritization on Arabic speech recogntion","authors":"Fawaz S. Al-Anzi, Dia AbuZeina","doi":"10.1109/AEECT.2017.8257758","DOIUrl":null,"url":null,"abstract":"Arabic automatic speech recognition (ASR) is a successful application of natural language processing (NLP). However, Arabic formal text is generally written without diacritics, which produces different pronunciation forms. That is, the Arabic writing system allows discarding short vowels and, hence, forcing the reader to use the prior knowledge and the words context to infer the missing diacritics. For speech recognition, there are two options for textual training data; either diacritized (also called vowelized) or non-diacritized text. However, using non-diacritized text may introduce a challenge for Arabic ASR as missing the short vowels may lead to some confusion in the learning process. This ambiguity produces a less than optimal acoustic model that is one of the most important components of ASR systems. In this paper, we present the performance using diacritized and non-diacritized text. In the experiments, we used the Carnegie Mellon University (CMU) PocketSphinx speech recognizer. We also used a new “in house” modern standard Arabic (MSA) continuous speech corpus that contains 13.5 hours for training and 4.1 hours for testing. The text of the corpus was manually diacritized. For acoustic modelling, we used the phonetic tied-mixture (PTM). The experimental results show that the non-diacritized text system scored 76.4% (i.e. 1-word error rate (WER)) while the diacritized text based system scored 63.8%. Even the diacritized case has less accuracy due to the slight differences in diacritics; however, the non-diacritized case might be adequate and faultless for the Arabic native speakers.","PeriodicalId":286127,"journal":{"name":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AEECT.2017.8257758","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Arabic automatic speech recognition (ASR) is a successful application of natural language processing (NLP). However, Arabic formal text is generally written without diacritics, which produces different pronunciation forms. That is, the Arabic writing system allows discarding short vowels and, hence, forcing the reader to use the prior knowledge and the words context to infer the missing diacritics. For speech recognition, there are two options for textual training data; either diacritized (also called vowelized) or non-diacritized text. However, using non-diacritized text may introduce a challenge for Arabic ASR as missing the short vowels may lead to some confusion in the learning process. This ambiguity produces a less than optimal acoustic model that is one of the most important components of ASR systems. In this paper, we present the performance using diacritized and non-diacritized text. In the experiments, we used the Carnegie Mellon University (CMU) PocketSphinx speech recognizer. We also used a new “in house” modern standard Arabic (MSA) continuous speech corpus that contains 13.5 hours for training and 4.1 hours for testing. The text of the corpus was manually diacritized. For acoustic modelling, we used the phonetic tied-mixture (PTM). The experimental results show that the non-diacritized text system scored 76.4% (i.e. 1-word error rate (WER)) while the diacritized text based system scored 63.8%. Even the diacritized case has less accuracy due to the slight differences in diacritics; however, the non-diacritized case might be adequate and faultless for the Arabic native speakers.

查看原文本刊更多论文

变音符化对阿拉伯语语音识别的影响

阿拉伯语自动语音识别(ASR)是自然语言处理技术的成功应用。然而，阿拉伯语的正式文本通常没有变音符号，这就产生了不同的发音形式。也就是说，阿拉伯语书写系统允许丢弃短元音，因此，迫使读者使用先前的知识和单词上下文来推断缺失的变音符号。对于语音识别，文本训练数据有两种选择;变音符化(也称为元音化)或非变音符化的文本。然而，使用非变音符文本可能会给阿拉伯语ASR带来挑战，因为缺少短元音可能会导致学习过程中的一些混乱。这种模糊性产生了一个不太理想的声学模型，这是ASR系统最重要的组成部分之一。在本文中，我们介绍了使用变音符和非变音符文本的性能。在实验中，我们使用了卡内基梅隆大学(CMU)的PocketSphinx语音识别器。我们还使用了一个新的“内部”现代标准阿拉伯语(MSA)连续语音语料库，该语料库包含13.5小时的训练和4.1小时的测试。语料库的文本被手动变音符了。对于声学建模，我们使用语音捆绑混合(PTM)。实验结果表明，非变字符文本系统的错误率为76.4%(即1字错误率(WER))，而基于变字符文本的系统的错误率为63.8%。由于变音符号的细微差别，即使是变音符号的情况下，准确率也较低;然而，对于以阿拉伯语为母语的人来说，没有变音符的情况可能是足够的和完美的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)

自引率

0.00%

发文量