Text Normalization for Bangla, Khmer, Nepali, Javanese, Sinhala and Sundanese Text-to-Speech Systems

Workshop on Spoken Language Technologies for Under-resourced Languages Pub Date : 2018-08-29 DOI:10.21437/SLTU.2018-31

Keshan Sanjaya Sodimana, Pasindu De Silva, R. Sproat, T. Wattanavekin, Alexander Gutkin, Knot Pipatsrisawat

引用次数: 6

Abstract

Text normalization is the process of converting non-standard words (NSWs) such as numbers, and abbreviations into standard words so that their pronunciations can be derived by a typical means (usually lexicon lookups). Text normalization is, thus, an important component of any text-to-speech (TTS) system. Without text normalization, the resulting voice may sound unintelligent. In this paper, we describe an approach to develop rule-based text normalization. We also describe our open source repository containing text normalization grammars and tests for Bangla, Javanese, Khmer, Nepali, Sinhala and Sundanese. Fi-nally, we present a recipe for utilizing the grammars in a TTS system.

查看原文本刊更多论文

孟加拉语、高棉语、尼泊尔语、爪哇语、僧伽罗语和巽他语文本到语音系统的文本规范化

文本规范化是将数字和缩写等非标准单词转换为标准单词的过程，以便通过典型方法(通常是词典查找)推导出它们的发音。因此，文本规范化是任何文本到语音(TTS)系统的重要组成部分。如果没有文本规范化，生成的声音可能听起来很不智能。在本文中，我们描述了一种开发基于规则的文本规范化的方法。我们还描述了包含孟加拉语、爪哇语、高棉语、尼泊尔语、僧伽罗语和巽他语文本规范化语法和测试的开源存储库。最后，我们给出了在TTS系统中使用语法的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Spoken Language Technologies for Under-resourced Languages

自引率

0.00%

发文量