提高阿拉伯语文本语音合成质量的新方法

Oumaima Zine, A. Meziane
{"title":"提高阿拉伯语文本语音合成质量的新方法","authors":"Oumaima Zine, A. Meziane","doi":"10.1109/ATSIP.2017.8075550","DOIUrl":null,"url":null,"abstract":"Text To Speech technologies are widely being used in applications to help users with special needs such as blind, deafened, individuals with severe speech impairments and dyslexics. In this context particular focus has been given by the Text To Speech (TTS) researchers to achieve a high level of intelligibility for many languages such as French and English. However, Arabic TTS is still in its early development stages and needs to be improved to reach high quality. In this paper, we describe a novel concatenative approach based on lemma and Arabic patterns. Moreover, an alternative method for synthesizing diacritized Arabic texts is proposed and adopted, using a set of sub-segments where the consonant is considered as the nucleus of the acoustic unit, and hence this latter is taken with its vocalic context. This speech unit consists of half vowel-Consonant-Half vowel, adapted to the different positions in the word (Initial, medial and final). A reduction process of the resulted combinations of the proposed acoustic units will also be described in this work, in order to reduce the theoretical number of the generated sub-segment models. Furthermore, a speech corpora design for Arabic Text To Speech (ATTS) based on pre-recorded Audiobooks from Masmoo3 Audiobooks website, is presented. The corpus contains more than 4 hours of a continuous speech of Modern Standard Arabic (MSA) recorded in high intelligibility and providing phonetically balanced sentences. The proposed approach was evaluated using the Diagnostic Rhyme Test (DRT) that measures the intelligibility of the synthesized speech on the word-level. The sentence-level test was also conducted. The results of the both tests are illustrated in the experiments and results section.","PeriodicalId":259951,"journal":{"name":"2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","volume":"59 6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Novel approach for quality enhancement of Arabic text to speech synthesis\",\"authors\":\"Oumaima Zine, A. Meziane\",\"doi\":\"10.1109/ATSIP.2017.8075550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text To Speech technologies are widely being used in applications to help users with special needs such as blind, deafened, individuals with severe speech impairments and dyslexics. In this context particular focus has been given by the Text To Speech (TTS) researchers to achieve a high level of intelligibility for many languages such as French and English. However, Arabic TTS is still in its early development stages and needs to be improved to reach high quality. In this paper, we describe a novel concatenative approach based on lemma and Arabic patterns. Moreover, an alternative method for synthesizing diacritized Arabic texts is proposed and adopted, using a set of sub-segments where the consonant is considered as the nucleus of the acoustic unit, and hence this latter is taken with its vocalic context. This speech unit consists of half vowel-Consonant-Half vowel, adapted to the different positions in the word (Initial, medial and final). A reduction process of the resulted combinations of the proposed acoustic units will also be described in this work, in order to reduce the theoretical number of the generated sub-segment models. Furthermore, a speech corpora design for Arabic Text To Speech (ATTS) based on pre-recorded Audiobooks from Masmoo3 Audiobooks website, is presented. The corpus contains more than 4 hours of a continuous speech of Modern Standard Arabic (MSA) recorded in high intelligibility and providing phonetically balanced sentences. The proposed approach was evaluated using the Diagnostic Rhyme Test (DRT) that measures the intelligibility of the synthesized speech on the word-level. The sentence-level test was also conducted. The results of the both tests are illustrated in the experiments and results section.\",\"PeriodicalId\":259951,\"journal\":{\"name\":\"2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"volume\":\"59 6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ATSIP.2017.8075550\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ATSIP.2017.8075550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

文本转语音技术被广泛应用于帮助有特殊需求的用户,如盲人、聋哑人、有严重语言障碍和阅读障碍的人。在这种背景下,文本到语音(TTS)研究人员特别关注如何实现法语和英语等许多语言的高水平可理解性。然而,阿拉伯语TTS仍处于早期发展阶段,需要改进以达到高质量。在本文中,我们描述了一种新的基于引理和阿拉伯语模式的连接方法。此外,还提出并采用了一种合成变音符阿拉伯语文本的替代方法,即使用一组子段,其中辅音被认为是声学单位的核心,因此后者与其语音上下文相结合。这个发音单元由半元音-辅音-半元音组成,适应于单词的不同位置(声母、中音和韵母)。为了减少生成的子段模型的理论数量,还将在本工作中描述所提出的声学单元的结果组合的减少过程。在此基础上,提出了一种基于Masmoo3有声读物网站预录有声读物的阿拉伯语文本到语音(ATTS)语料库设计。该语料库包含超过4小时的现代标准阿拉伯语(MSA)连续演讲,以高清晰度录制,并提供语音平衡的句子。使用诊断韵测试(DRT)对所提出的方法进行了评估,该测试在单词水平上测量合成语音的可理解性。还进行了句子水平测试。两种试验的结果都在实验和结果一节中说明。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Novel approach for quality enhancement of Arabic text to speech synthesis
Text To Speech technologies are widely being used in applications to help users with special needs such as blind, deafened, individuals with severe speech impairments and dyslexics. In this context particular focus has been given by the Text To Speech (TTS) researchers to achieve a high level of intelligibility for many languages such as French and English. However, Arabic TTS is still in its early development stages and needs to be improved to reach high quality. In this paper, we describe a novel concatenative approach based on lemma and Arabic patterns. Moreover, an alternative method for synthesizing diacritized Arabic texts is proposed and adopted, using a set of sub-segments where the consonant is considered as the nucleus of the acoustic unit, and hence this latter is taken with its vocalic context. This speech unit consists of half vowel-Consonant-Half vowel, adapted to the different positions in the word (Initial, medial and final). A reduction process of the resulted combinations of the proposed acoustic units will also be described in this work, in order to reduce the theoretical number of the generated sub-segment models. Furthermore, a speech corpora design for Arabic Text To Speech (ATTS) based on pre-recorded Audiobooks from Masmoo3 Audiobooks website, is presented. The corpus contains more than 4 hours of a continuous speech of Modern Standard Arabic (MSA) recorded in high intelligibility and providing phonetically balanced sentences. The proposed approach was evaluated using the Diagnostic Rhyme Test (DRT) that measures the intelligibility of the synthesized speech on the word-level. The sentence-level test was also conducted. The results of the both tests are illustrated in the experiments and results section.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信