文本到语音合成中的问题

Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174) Pub Date : 1998-03-21 DOI:10.1109/IJSIS.1998.685467

M. Macchi

{"title":"文本到语音合成中的问题","authors":"M. Macchi","doi":"10.1109/IJSIS.1998.685467","DOIUrl":null,"url":null,"abstract":"The ultimate goal of text-to-speech synthesis is to convert ordinary orthographic text into an acoustic signal that is indistinguishable from human speech. Originally, synthesis systems were architected around a system of rules and models that were based on research on human language and speech production and perception processes. The quality of speech produced by such systems is inherently limited by the quality of the rules and the models. Given that our knowledge of human speech processes is still incomplete, the quality of text-to-speech is far from natural-sounding. Hence, today's interest in high quality speech for applications, in combination with advances in computer resource, has caused the focus to shift from rules and model-based methods to corpus-based methods that presumably bypass rules and models. For example, many systems now rely on large word pronunciation dictionaries instead of letter-to-phoneme rules and large prerecorded sound inventories instead of rules predicting the acoustic correlates of phonemes. Because of the need to analyze large amounts of data, this approach relies on automated techniques such as those used in automatic speech recognition.","PeriodicalId":289764,"journal":{"name":"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"Issues in text-to-speech synthesis\",\"authors\":\"M. Macchi\",\"doi\":\"10.1109/IJSIS.1998.685467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ultimate goal of text-to-speech synthesis is to convert ordinary orthographic text into an acoustic signal that is indistinguishable from human speech. Originally, synthesis systems were architected around a system of rules and models that were based on research on human language and speech production and perception processes. The quality of speech produced by such systems is inherently limited by the quality of the rules and the models. Given that our knowledge of human speech processes is still incomplete, the quality of text-to-speech is far from natural-sounding. Hence, today's interest in high quality speech for applications, in combination with advances in computer resource, has caused the focus to shift from rules and model-based methods to corpus-based methods that presumably bypass rules and models. For example, many systems now rely on large word pronunciation dictionaries instead of letter-to-phoneme rules and large prerecorded sound inventories instead of rules predicting the acoustic correlates of phonemes. Because of the need to analyze large amounts of data, this approach relies on automated techniques such as those used in automatic speech recognition.\",\"PeriodicalId\":289764,\"journal\":{\"name\":\"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJSIS.1998.685467\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJSIS.1998.685467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

文本到语音合成的最终目标是将普通的正字法文本转换成与人类语音无法区分的声学信号。最初，合成系统是围绕基于人类语言和语音产生和感知过程研究的规则和模型系统构建的。这种系统产生的语音质量受到规则和模型质量的内在限制。考虑到我们对人类语音过程的了解仍然不完整，文本到语音的质量远远不够自然。因此，今天对高质量语音应用的兴趣，加上计算机资源的进步，已经使焦点从基于规则和模型的方法转移到基于语料库的方法，这些方法可能会绕过规则和模型。例如，许多系统现在依赖于大量的单词发音字典，而不是字母到音素的规则;依赖于大量预先录制的声音库存，而不是预测音素声学关联的规则。由于需要分析大量数据，这种方法依赖于自动语音识别等自动化技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Issues in text-to-speech synthesis

The ultimate goal of text-to-speech synthesis is to convert ordinary orthographic text into an acoustic signal that is indistinguishable from human speech. Originally, synthesis systems were architected around a system of rules and models that were based on research on human language and speech production and perception processes. The quality of speech produced by such systems is inherently limited by the quality of the rules and the models. Given that our knowledge of human speech processes is still incomplete, the quality of text-to-speech is far from natural-sounding. Hence, today's interest in high quality speech for applications, in combination with advances in computer resource, has caused the focus to shift from rules and model-based methods to corpus-based methods that presumably bypass rules and models. For example, many systems now rely on large word pronunciation dictionaries instead of letter-to-phoneme rules and large prerecorded sound inventories instead of rules predicting the acoustic correlates of phonemes. Because of the need to analyze large amounts of data, this approach relies on automated techniques such as those used in automatic speech recognition.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. IEEE International Joint Symposia on Intelligence and Systems (Cat. No.98EX174)

自引率

0.00%

发文量