{"title":"Novel approach for quality enhancement of Arabic text to speech synthesis","authors":"Oumaima Zine, A. Meziane","doi":"10.1109/ATSIP.2017.8075550","DOIUrl":null,"url":null,"abstract":"Text To Speech technologies are widely being used in applications to help users with special needs such as blind, deafened, individuals with severe speech impairments and dyslexics. In this context particular focus has been given by the Text To Speech (TTS) researchers to achieve a high level of intelligibility for many languages such as French and English. However, Arabic TTS is still in its early development stages and needs to be improved to reach high quality. In this paper, we describe a novel concatenative approach based on lemma and Arabic patterns. Moreover, an alternative method for synthesizing diacritized Arabic texts is proposed and adopted, using a set of sub-segments where the consonant is considered as the nucleus of the acoustic unit, and hence this latter is taken with its vocalic context. This speech unit consists of half vowel-Consonant-Half vowel, adapted to the different positions in the word (Initial, medial and final). A reduction process of the resulted combinations of the proposed acoustic units will also be described in this work, in order to reduce the theoretical number of the generated sub-segment models. Furthermore, a speech corpora design for Arabic Text To Speech (ATTS) based on pre-recorded Audiobooks from Masmoo3 Audiobooks website, is presented. The corpus contains more than 4 hours of a continuous speech of Modern Standard Arabic (MSA) recorded in high intelligibility and providing phonetically balanced sentences. The proposed approach was evaluated using the Diagnostic Rhyme Test (DRT) that measures the intelligibility of the synthesized speech on the word-level. The sentence-level test was also conducted. The results of the both tests are illustrated in the experiments and results section.","PeriodicalId":259951,"journal":{"name":"2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"59 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP.2017.8075550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Text To Speech technologies are widely being used in applications to help users with special needs such as blind, deafened, individuals with severe speech impairments and dyslexics. In this context particular focus has been given by the Text To Speech (TTS) researchers to achieve a high level of intelligibility for many languages such as French and English. However, Arabic TTS is still in its early development stages and needs to be improved to reach high quality. In this paper, we describe a novel concatenative approach based on lemma and Arabic patterns. Moreover, an alternative method for synthesizing diacritized Arabic texts is proposed and adopted, using a set of sub-segments where the consonant is considered as the nucleus of the acoustic unit, and hence this latter is taken with its vocalic context. This speech unit consists of half vowel-Consonant-Half vowel, adapted to the different positions in the word (Initial, medial and final). A reduction process of the resulted combinations of the proposed acoustic units will also be described in this work, in order to reduce the theoretical number of the generated sub-segment models. Furthermore, a speech corpora design for Arabic Text To Speech (ATTS) based on pre-recorded Audiobooks from Masmoo3 Audiobooks website, is presented. The corpus contains more than 4 hours of a continuous speech of Modern Standard Arabic (MSA) recorded in high intelligibility and providing phonetically balanced sentences. The proposed approach was evaluated using the Diagnostic Rhyme Test (DRT) that measures the intelligibility of the synthesized speech on the word-level. The sentence-level test was also conducted. The results of the both tests are illustrated in the experiments and results section.