标准与重音汉语语音识别中的本地失配电话置信度度量

Wenxiao Cao, Yi Liu, T. Zheng
{"title":"标准与重音汉语语音识别中的本地失配电话置信度度量","authors":"Wenxiao Cao, Yi Liu, T. Zheng","doi":"10.1109/CHINSL.2008.ECP.64","DOIUrl":null,"url":null,"abstract":"High error rate in speech recognition is largely due to effects of phone local mismatch caused by unclear speaking or noises. In this paper, we propose an approach of using local mismatch phone to improve the reliability of confidence measure. The features of local mismatch phone can be extracted from the recognition phone sequence by computing occurrence frequency of each phone and comparing with a preset threshold. Occurrence frequency is defined as occurrence time of recognition phone in its frame best phone sequence divided by interval. Frame best phone is the symbol of HMM state at the end of maximum likelihood token at certain frame. The effectiveness of this feature is evaluated on standard and accented Mandarin speech databases. It gives significant Equal Error Rate reduction of 19.7% and 8.4%, respectively. In addition to fast computation, this feature is independent of acoustic model, and is convenient for combination with other features.","PeriodicalId":271277,"journal":{"name":"International Symposium on Chinese Spoken Language Processing","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Local Mismatch Phone for Confidence Measure in Standard and Accented Chinese Speech Recognition\",\"authors\":\"Wenxiao Cao, Yi Liu, T. Zheng\",\"doi\":\"10.1109/CHINSL.2008.ECP.64\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High error rate in speech recognition is largely due to effects of phone local mismatch caused by unclear speaking or noises. In this paper, we propose an approach of using local mismatch phone to improve the reliability of confidence measure. The features of local mismatch phone can be extracted from the recognition phone sequence by computing occurrence frequency of each phone and comparing with a preset threshold. Occurrence frequency is defined as occurrence time of recognition phone in its frame best phone sequence divided by interval. Frame best phone is the symbol of HMM state at the end of maximum likelihood token at certain frame. The effectiveness of this feature is evaluated on standard and accented Mandarin speech databases. It gives significant Equal Error Rate reduction of 19.7% and 8.4%, respectively. In addition to fast computation, this feature is independent of acoustic model, and is convenient for combination with other features.\",\"PeriodicalId\":271277,\"journal\":{\"name\":\"International Symposium on Chinese Spoken Language Processing\",\"volume\":\"125 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Symposium on Chinese Spoken Language Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CHINSL.2008.ECP.64\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Chinese Spoken Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CHINSL.2008.ECP.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

语音识别的高错误率很大程度上是由于语音不清或噪声引起的电话本地不匹配的影响。本文提出了一种利用本地失配电话提高置信度测度可靠性的方法。通过计算每个电话的出现频率并与预设阈值进行比较,可以从识别电话序列中提取局部失配电话的特征。出现频率定义为识别电话在其帧内最佳电话序列中的出现时间除以间隔。帧最佳电话是某一帧最大似然令牌末端HMM状态的符号。在标准和重音普通话语音数据库上对该特征的有效性进行了评估。它使相等错误率分别降低了19.7%和8.4%。除计算速度快外,该特征与声学模型无关,便于与其他特征组合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Local Mismatch Phone for Confidence Measure in Standard and Accented Chinese Speech Recognition
High error rate in speech recognition is largely due to effects of phone local mismatch caused by unclear speaking or noises. In this paper, we propose an approach of using local mismatch phone to improve the reliability of confidence measure. The features of local mismatch phone can be extracted from the recognition phone sequence by computing occurrence frequency of each phone and comparing with a preset threshold. Occurrence frequency is defined as occurrence time of recognition phone in its frame best phone sequence divided by interval. Frame best phone is the symbol of HMM state at the end of maximum likelihood token at certain frame. The effectiveness of this feature is evaluated on standard and accented Mandarin speech databases. It gives significant Equal Error Rate reduction of 19.7% and 8.4%, respectively. In addition to fast computation, this feature is independent of acoustic model, and is convenient for combination with other features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信