A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition

Chao Wang, S. Seneff
{"title":"A study of tones and tempo in continuous Mandarin digit strings and their application in telephone quality speech recognition","authors":"Chao Wang, S. Seneff","doi":"10.21437/ICSLP.1998-140","DOIUrl":null,"url":null,"abstract":"Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the application of these models to improve recognition. The duration modeling includes a speaking-rate normalization scheme. A novel F0 extraction algorithm is developed, and parameters based on orthonormal decomposition of theF0 contour are extracted for tone recognition. Context dependency is expressed by “tri-tone” models clustered into broader classes. A 20.0% error rate is achieved for four-tone classification. Over a baseline recognition performance of 5.1% word error rate, we achieve 31.4% error reduction with duration models, 23.5% error reduction with tone models, and 39.2% error reduction with duration and tone models combined.","PeriodicalId":117113,"journal":{"name":"5th International Conference on Spoken Language Processing (ICSLP 1998)","volume":"131 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"5th International Conference on Spoken Language Processing (ICSLP 1998)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/ICSLP.1998-140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

Abstract

Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the application of these models to improve recognition. The duration modeling includes a speaking-rate normalization scheme. A novel F0 extraction algorithm is developed, and parameters based on orthonormal decomposition of theF0 contour are extracted for tone recognition. Context dependency is expressed by “tri-tone” models clustered into broader classes. A 20.0% error rate is achieved for four-tone classification. Over a baseline recognition performance of 5.1% word error rate, we achieve 31.4% error reduction with duration models, 23.5% error reduction with tone models, and 39.2% error reduction with duration and tone models combined.
普通话连续数字串的音调和节奏及其在电话质量语音识别中的应用研究
韵律线索(即基本频率、能量和持续时间)为言语提供了重要信息。对于像汉语这样的声调语言,基频(F0)在声调表征中也起着至关重要的作用,它是一个基本的音位特征。在本文中,我们描述了我们在电话质量连续汉语数字的持续时间和音调建模方面的工作,以及这些模型在提高识别方面的应用。持续时间建模包括一个语速归一化方案。提出了一种新的F0提取算法,并基于正交分解的F0轮廓提取参数进行音调识别。上下文依赖关系是通过聚集到更广泛的类中的“三音调”模型来表示的。四音分类的错误率为20.0%。在5.1%单词错误率的基线识别性能下,我们使用时长模型实现了31.4%的误差率降低,使用音调模型实现了23.5%的误差率降低,使用时长和音调模型组合实现了39.2%的误差率降低。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信