A research bed for unit selection based text to speech synthesis

2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI:10.1109/SLT.2008.4777882

K. Sarathy, A. Ramakrishnan

{"title":"A research bed for unit selection based text to speech synthesis","authors":"K. Sarathy, A. Ramakrishnan","doi":"10.1109/SLT.2008.4777882","DOIUrl":null,"url":null,"abstract":"The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich pre-recorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation, thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.","PeriodicalId":186876,"journal":{"name":"2008 IEEE Spoken Language Technology Workshop","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE Spoken Language Technology Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2008.4777882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

The paper describes a modular, unit selection based TTS framework, which can be used as a research bed for developing TTS in any new language, as well as studying the effect of changing any parameter during synthesis. Using this framework, TTS has been developed for Tamil. Synthesis database consists of 1027 phonetically rich pre-recorded sentences. This framework has already been tested for Kannada. Our TTS synthesizes intelligible and acceptably natural speech, as supported by high mean opinion scores. The framework is further optimized to suit embedded applications like mobiles and PDAs. We compressed the synthesis speech database with standard speech compression algorithms used in commercial GSM phones and evaluated the quality of the resultant synthesized sentences. Even with a highly compressed database, the synthesized output is perceptually close to that with uncompressed database. Through experiments, we explored the ambiguities in human perception when listening to Tamil phones and syllables uttered in isolation, thus proposing to exploit the misperception to substitute for missing phone contexts in the database. Listening experiments have been conducted on sentences synthesized by deliberately replacing phones with their confused ones.

查看原文本刊更多论文

基于单元选择的文本到语音合成研究平台

本文描述了一个模块化的、基于单元选择的TTS框架，它可以作为一个研究平台，用于开发任何新语言的TTS，以及研究在合成过程中改变任何参数的影响。使用这个框架，TTS已经为泰米尔开发出来了。合成数据库由1027个语音丰富的预录句子组成。这个框架已经在卡纳达语中进行了测试。我们的TTS综合了可理解和可接受的自然语音，并得到了高平均意见分数的支持。该框架进一步优化，以适应嵌入式应用，如手机和pda。我们使用商用GSM电话中使用的标准语音压缩算法压缩合成语音数据库，并评估合成句子的质量。即使使用高度压缩的数据库，合成的输出在感知上也与未压缩的数据库接近。通过实验，我们探索了人类在听泰米尔语电话和孤立音节时感知的模糊性，从而提出利用误解来替代数据库中缺失的电话上下文。通过故意用混淆的电话代替电话合成的句子进行了听力实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE Spoken Language Technology Workshop

自引率

0.00%

发文量