A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition

5th International Conference on Spoken Language Processing (ICSLP 1998) Pub Date : 1998-11-30 DOI:10.21437/ICSLP.1998-140

Chao Wang, S. Seneff

{"title":"A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition","authors":"Chao Wang, S. Seneff","doi":"10.21437/ICSLP.1998-140","DOIUrl":null,"url":null,"abstract":"Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the application of these models to improve recognition. The duration modeling includes a speaking-rate normalization scheme. A novel F0 extraction algorithm is developed, and parameters based on orthonormal decomposition of theF0 contour are extracted for tone recognition. Context dependency is expressed by “tri-tone” models clustered into broader classes. A 20.0% error rate is achieved for four-tone classification. Over a baseline recognition performance of 5.1% word error rate, we achieve 31.4% error reduction with duration models, 23.5% error reduction with tone models, and 39.2% error reduction with duration and tone models combined.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Spoken Language Processing (ICSLP 1998)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

Abstract

Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the application of these models to improve recognition. The duration modeling includes a speaking-rate normalization scheme. A novel F0 extraction algorithm is developed, and parameters based on orthonormal decomposition of theF0 contour are extracted for tone recognition. Context dependency is expressed by “tri-tone” models clustered into broader classes. A 20.0% error rate is achieved for four-tone classification. Over a baseline recognition performance of 5.1% word error rate, we achieve 31.4% error reduction with duration models, 23.5% error reduction with tone models, and 39.2% error reduction with duration and tone models combined.

查看原文本刊更多论文

普通话连续数字串的音调和节奏及其在电话质量语音识别中的应用研究

韵律线索(即基本频率、能量和持续时间)为言语提供了重要信息。对于像汉语这样的声调语言，基频(F0)在声调表征中也起着至关重要的作用，它是一个基本的音位特征。在本文中，我们描述了我们在电话质量连续汉语数字的持续时间和音调建模方面的工作，以及这些模型在提高识别方面的应用。持续时间建模包括一个语速归一化方案。提出了一种新的F0提取算法，并基于正交分解的F0轮廓提取参数进行音调识别。上下文依赖关系是通过聚集到更广泛的类中的“三音调”模型来表示的。四音分类的错误率为20.0%。在5.1%单词错误率的基线识别性能下，我们使用时长模型实现了31.4%的误差率降低，使用音调模型实现了23.5%的误差率降低，使用时长和音调模型组合实现了39.2%的误差率降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

5th International Conference on Spoken Language Processing (ICSLP 1998)

自引率

0.00%

发文量