Developing Concatenative Based Text to Speech Synthesizer for Tigrigna Language

Internet of Things and Cloud Computing Pub Date : 2020-10-28 DOI:10.11648/j.iotcc.20200802.12

Mezgebe Araya Keletay, Hussien Seid Worku

{"title":"Developing Concatenative Based Text to Speech Synthesizer for Tigrigna Language","authors":"Mezgebe Araya Keletay, Hussien Seid Worku","doi":"10.11648/j.iotcc.20200802.12","DOIUrl":null,"url":null,"abstract":"A Text-To-Speech (TTS) synthesizer is a computer-based system able to read any text and convert it into speech that resembles as closely as possible a native speaker of the language. This thesis describes the first Text-to-Speech (TTS) system for the Tigrigna language, using speech synthesis architecture in MATLAB. The TTS system is working based on concatenative synthesis and applying LPC technique. The performance of the system is measured and the quality of synthesized speech is assessed in terms of intelligibility and naturalness. The result of the synthesizer is evaluated in two ways, in word level and sentences level. The test results indicate in the word level is evaluated by NeoSpeech tool online and most of the words are recognizable. The overall performance of the system in the word level which is evaluated by NeoSpeech tool is found to be 78%. When it comes to the intelligibility and naturalness of the synthesized speech in the sentence level, it is measured in MOS scale and the overall intelligibility and naturalness of the system is found to be 3.28 and 3.27 respectively. The values of performance, intelligibility and naturalness are encouraging and show that diphone speech units are good candidates to develop fully functional speech synthesizer. But there are areas that can be improved. Inclusion of text analyzer to pronounce zonal dialects of the language and prosody generator are some of the things that need further investigation.","PeriodicalId":173948,"journal":{"name":"Internet of Things and Cloud Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Internet of Things and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.11648/j.iotcc.20200802.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

A Text-To-Speech (TTS) synthesizer is a computer-based system able to read any text and convert it into speech that resembles as closely as possible a native speaker of the language. This thesis describes the first Text-to-Speech (TTS) system for the Tigrigna language, using speech synthesis architecture in MATLAB. The TTS system is working based on concatenative synthesis and applying LPC technique. The performance of the system is measured and the quality of synthesized speech is assessed in terms of intelligibility and naturalness. The result of the synthesizer is evaluated in two ways, in word level and sentences level. The test results indicate in the word level is evaluated by NeoSpeech tool online and most of the words are recognizable. The overall performance of the system in the word level which is evaluated by NeoSpeech tool is found to be 78%. When it comes to the intelligibility and naturalness of the synthesized speech in the sentence level, it is measured in MOS scale and the overall intelligibility and naturalness of the system is found to be 3.28 and 3.27 respectively. The values of performance, intelligibility and naturalness are encouraging and show that diphone speech units are good candidates to develop fully functional speech synthesizer. But there are areas that can be improved. Inclusion of text analyzer to pronounce zonal dialects of the language and prosody generator are some of the things that need further investigation.

查看原文本刊更多论文

Tigrigna语言中基于连接的文本到语音合成器的开发

文本到语音(TTS)合成器是一种基于计算机的系统，能够读取任何文本并将其转换为尽可能接近该语言母语者的语音。本文介绍了第一个Tigrigna语言的文本到语音(TTS)系统，使用MATLAB中的语音合成体系结构。TTS系统以串联合成为基础，采用LPC技术进行工作。测试了系统的性能，并从可理解性和自然度两个方面评价了合成语音的质量。从单词水平和句子水平两方面对合成器的结果进行评价。测试结果表明，通过在线NeoSpeech工具对单词水平进行了评估，大部分单词都是可识别的。经NeoSpeech工具评估，该系统在词级的总体表现为78%。对于合成语音在句子层面的可理解度和自然度，采用MOS量表进行测量，系统的整体可理解度和自然度分别为3.28和3.27。性能、可理解性和自然度的价值是令人鼓舞的，表明diphone语音单元是开发全功能语音合成器的良好候选者。但也有可以改进的地方。包括文本分析器对语言区域方言的发音和韵律生成器是一些需要进一步研究的事情。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Internet of Things and Cloud Computing

自引率

0.00%

发文量