Pronunciation modeling for speech technology

2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04. Pub Date : 2004-12-11 DOI:10.1109/SPCOM.2004.1458347

T. Svendsen

{"title":"Pronunciation modeling for speech technology","authors":"T. Svendsen","doi":"10.1109/SPCOM.2004.1458347","DOIUrl":null,"url":null,"abstract":"Written text is based on an orthographic representation of words, i.e. linear sequences of letters. Modern speech technology (automatic speech recognition and text-to-speech synthesis) is based on phonetic units representing realization of sounds. A mapping between the orthographic form and phonetic forms representing the pronunciation is thus required. This may be obtained by creating pronunciation lexica and/or rule-based systems for grapheme-to-phoneme conversion. Traditionally, this mapping has been obtained manually, based on phonetic and linguistic knowledge. This approach has a number of drawbacks: i) the pronunciations represent typical pronunciations and will have a limited capacity for describing pronunciation variation due to speaking style and dialectical/accent variations; ii) if multiple pronunciation variants are included, it does not indicate which variants are more significant for the specific application; iii) the description is based on phonetic-knowledge and does not take into account that the units used in speech technology may deviate from the phonetic interpretation; and iv) the description is limited to units with a linguistic interpretation. The paper will present and discuss methods for modeling pronunciation and pronunciation variation specifically for applications in speech technology.","PeriodicalId":424981,"journal":{"name":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM.2004.1458347","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Written text is based on an orthographic representation of words, i.e. linear sequences of letters. Modern speech technology (automatic speech recognition and text-to-speech synthesis) is based on phonetic units representing realization of sounds. A mapping between the orthographic form and phonetic forms representing the pronunciation is thus required. This may be obtained by creating pronunciation lexica and/or rule-based systems for grapheme-to-phoneme conversion. Traditionally, this mapping has been obtained manually, based on phonetic and linguistic knowledge. This approach has a number of drawbacks: i) the pronunciations represent typical pronunciations and will have a limited capacity for describing pronunciation variation due to speaking style and dialectical/accent variations; ii) if multiple pronunciation variants are included, it does not indicate which variants are more significant for the specific application; iii) the description is based on phonetic-knowledge and does not take into account that the units used in speech technology may deviate from the phonetic interpretation; and iv) the description is limited to units with a linguistic interpretation. The paper will present and discuss methods for modeling pronunciation and pronunciation variation specifically for applications in speech technology.

查看原文本刊更多论文

语音技术的发音建模

书面文本是基于单词的正字法表示，即字母的线性序列。现代语音技术(自动语音识别和文本到语音的合成)是基于表示声音实现的语音单位。因此，表示发音的正字法形式和语音形式之间的映射是必需的。这可以通过创建发音词典和/或基于规则的系统来实现字素到音素的转换。传统上，这种映射是基于语音和语言知识手工获得的。这种方法有一些缺点:1)发音代表典型的发音，由于说话风格和辩证/口音的变化，描述发音变化的能力有限;Ii)如果包含多个发音变体，则没有说明哪些变体对具体应用更重要;Iii)描述是基于语音知识的，没有考虑到语音技术中使用的单位可能偏离语音解释;iv)描述仅限于具有语言解释的单位。本文将介绍和讨论语音建模和语音变化的方法，特别是在语音技术中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04.

自引率

0.00%

发文量