语音技术的发音建模

2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04. Pub Date : 2004-12-11 DOI:10.1109/SPCOM.2004.1458347

T. Svendsen

{"title":"语音技术的发音建模","authors":"T. Svendsen","doi":"10.1109/SPCOM.2004.1458347","DOIUrl":null,"url":null,"abstract":"Written text is based on an orthographic representation of words, i.e. linear sequences of letters. Modern speech technology (automatic speech recognition and text-to-speech synthesis) is based on phonetic units representing realization of sounds. A mapping between the orthographic form and phonetic forms representing the pronunciation is thus required. This may be obtained by creating pronunciation lexica and/or rule-based systems for grapheme-to-phoneme conversion. Traditionally, this mapping has been obtained manually, based on phonetic and linguistic knowledge. This approach has a number of drawbacks: i) the pronunciations represent typical pronunciations and will have a limited capacity for describing pronunciation variation due to speaking style and dialectical/accent variations; ii) if multiple pronunciation variants are included, it does not indicate which variants are more significant for the specific application; iii) the description is based on phonetic-knowledge and does not take into account that the units used in speech technology may deviate from the phonetic interpretation; and iv) the description is limited to units with a linguistic interpretation. The paper will present and discuss methods for modeling pronunciation and pronunciation variation specifically for applications in speech technology.","PeriodicalId":424981,"journal":{"name":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Pronunciation modeling for speech technology\",\"authors\":\"T. Svendsen\",\"doi\":\"10.1109/SPCOM.2004.1458347\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Written text is based on an orthographic representation of words, i.e. linear sequences of letters. Modern speech technology (automatic speech recognition and text-to-speech synthesis) is based on phonetic units representing realization of sounds. A mapping between the orthographic form and phonetic forms representing the pronunciation is thus required. This may be obtained by creating pronunciation lexica and/or rule-based systems for grapheme-to-phoneme conversion. Traditionally, this mapping has been obtained manually, based on phonetic and linguistic knowledge. This approach has a number of drawbacks: i) the pronunciations represent typical pronunciations and will have a limited capacity for describing pronunciation variation due to speaking style and dialectical/accent variations; ii) if multiple pronunciation variants are included, it does not indicate which variants are more significant for the specific application; iii) the description is based on phonetic-knowledge and does not take into account that the units used in speech technology may deviate from the phonetic interpretation; and iv) the description is limited to units with a linguistic interpretation. The paper will present and discuss methods for modeling pronunciation and pronunciation variation specifically for applications in speech technology.\",\"PeriodicalId\":424981,\"journal\":{\"name\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM.2004.1458347\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM.2004.1458347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

书面文本是基于单词的正字法表示，即字母的线性序列。现代语音技术(自动语音识别和文本到语音的合成)是基于表示声音实现的语音单位。因此，表示发音的正字法形式和语音形式之间的映射是必需的。这可以通过创建发音词典和/或基于规则的系统来实现字素到音素的转换。传统上，这种映射是基于语音和语言知识手工获得的。这种方法有一些缺点:1)发音代表典型的发音，由于说话风格和辩证/口音的变化，描述发音变化的能力有限;Ii)如果包含多个发音变体，则没有说明哪些变体对具体应用更重要;Iii)描述是基于语音知识的，没有考虑到语音技术中使用的单位可能偏离语音解释;iv)描述仅限于具有语言解释的单位。本文将介绍和讨论语音建模和语音变化的方法，特别是在语音技术中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Pronunciation modeling for speech technology

Written text is based on an orthographic representation of words, i.e. linear sequences of letters. Modern speech technology (automatic speech recognition and text-to-speech synthesis) is based on phonetic units representing realization of sounds. A mapping between the orthographic form and phonetic forms representing the pronunciation is thus required. This may be obtained by creating pronunciation lexica and/or rule-based systems for grapheme-to-phoneme conversion. Traditionally, this mapping has been obtained manually, based on phonetic and linguistic knowledge. This approach has a number of drawbacks: i) the pronunciations represent typical pronunciations and will have a limited capacity for describing pronunciation variation due to speaking style and dialectical/accent variations; ii) if multiple pronunciation variants are included, it does not indicate which variants are more significant for the specific application; iii) the description is based on phonetic-knowledge and does not take into account that the units used in speech technology may deviate from the phonetic interpretation; and iv) the description is limited to units with a linguistic interpretation. The paper will present and discuss methods for modeling pronunciation and pronunciation variation specifically for applications in speech technology.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.

自引率

0.00%

发文量