Leveraging phonetic context dependent invariant structure for continuous speech recognition

Congying Zhang, Masayuki Suzuki, Gakuto Kurata, M. Nishimura, N. Minematsu
{"title":"Leveraging phonetic context dependent invariant structure for continuous speech recognition","authors":"Congying Zhang, Masayuki Suzuki, Gakuto Kurata, M. Nishimura, N. Minematsu","doi":"10.1109/ChinaSIP.2014.6889200","DOIUrl":null,"url":null,"abstract":"Speech acoustics intrinsically vary due to linguistic and non-linguistic factors. The invariant structure extracted from a given utterance is one of the long-span acoustic representations, where acoustic variation caused by non-linguistic factors can be removed reasonably. It expresses spectral contrasts between acoustic events in an utterance. In previous studies, the invariant structure was leveraged in continuous speech recognition for reranking the N-best candidates hypothesized by a traditional automatic speech recognition (ASR) system. Use of the invariant structure features for reranking showed good effects, however, the features were defined or labeled in a phonetic-context-independent way. In this paper, use of phonetic context to define invariant structure features is examined. The proposed method is tested in two tasks of continuous digits speech recognition and large vocabulary continuous speech recognition (LVCSR). The performances are improved relatively by 4.7% and 1.2%, respectively.","PeriodicalId":248977,"journal":{"name":"2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ChinaSIP.2014.6889200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Speech acoustics intrinsically vary due to linguistic and non-linguistic factors. The invariant structure extracted from a given utterance is one of the long-span acoustic representations, where acoustic variation caused by non-linguistic factors can be removed reasonably. It expresses spectral contrasts between acoustic events in an utterance. In previous studies, the invariant structure was leveraged in continuous speech recognition for reranking the N-best candidates hypothesized by a traditional automatic speech recognition (ASR) system. Use of the invariant structure features for reranking showed good effects, however, the features were defined or labeled in a phonetic-context-independent way. In this paper, use of phonetic context to define invariant structure features is examined. The proposed method is tested in two tasks of continuous digits speech recognition and large vocabulary continuous speech recognition (LVCSR). The performances are improved relatively by 4.7% and 1.2%, respectively.
利用语音上下文相关不变结构进行连续语音识别
由于语言和非语言因素,语音声学在本质上是不同的。从给定话语中提取的不变结构是一种长跨度的声学表征,可以合理地去除非语言因素引起的声学变异。它表达了话语中声学事件之间的光谱对比。在以往的研究中,连续语音识别利用不变结构对传统自动语音识别(ASR)系统假设的n个最佳候选者进行重新排序。使用不变结构特征进行重排序效果良好,但特征的定义和标注方式与语音上下文无关。本文探讨了用语音语境来定义不变结构特征的方法。在连续数字语音识别和大词汇量连续语音识别两个任务中对该方法进行了测试。性能分别相对提高4.7%和1.2%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信