基于框架的语音语音识别

Kyu Jeong Han, Jason W. Pelecanos
{"title":"基于框架的语音语音识别","authors":"Kyu Jeong Han, Jason W. Pelecanos","doi":"10.1109/SLT.2012.6424240","DOIUrl":null,"url":null,"abstract":"This paper describes a frame-based phonotactic Language Identification (LID) system, which was used for the LID evaluation of the Robust Automatic Transcription of Speech (RATS) program by the Defense Advanced Research Projects Agency (DARPA). The proposed approach utilizes features derived from frame-level phone log-likelihoods from a phone recognizer. It is an attempt to capture not only phone sequence information but also short-term timing information for phone N-gram events, which is lacking in conventional phonotactic LID systems that simply count phone N-gram events. Based on this new method, we achieved 26% relative improvement in terms of Cavg for the RATS LID evaluation data compared to phone N-gram counts modeling. We also observed that it had a significant impact on score combination with our best acoustic system based on Mel-Frequency Cepstral Coefficients (MFCCs).","PeriodicalId":375378,"journal":{"name":"2012 IEEE Spoken Language Technology Workshop (SLT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Frame-based phonotactic Language Identification\",\"authors\":\"Kyu Jeong Han, Jason W. Pelecanos\",\"doi\":\"10.1109/SLT.2012.6424240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes a frame-based phonotactic Language Identification (LID) system, which was used for the LID evaluation of the Robust Automatic Transcription of Speech (RATS) program by the Defense Advanced Research Projects Agency (DARPA). The proposed approach utilizes features derived from frame-level phone log-likelihoods from a phone recognizer. It is an attempt to capture not only phone sequence information but also short-term timing information for phone N-gram events, which is lacking in conventional phonotactic LID systems that simply count phone N-gram events. Based on this new method, we achieved 26% relative improvement in terms of Cavg for the RATS LID evaluation data compared to phone N-gram counts modeling. We also observed that it had a significant impact on score combination with our best acoustic system based on Mel-Frequency Cepstral Coefficients (MFCCs).\",\"PeriodicalId\":375378,\"journal\":{\"name\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2012.6424240\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2012.6424240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

本文介绍了一种基于帧的语音定向语言识别(LID)系统,该系统用于美国国防高级研究计划局(DARPA)的鲁棒语音自动转录(RATS)项目的LID评估。所提出的方法利用了来自手机识别器的帧级电话日志似然的特征。它不仅试图捕获电话序列信息,而且还试图捕获电话N-gram事件的短期定时信息,这在传统的语音定向LID系统中是缺乏的,它只是简单地计数电话N-gram事件。基于这种新方法,与手机N-gram计数模型相比,我们在RATS LID评估数据的Cavg方面实现了26%的相对改进。我们还观察到,它对基于Mel-Frequency倒谱系数(MFCCs)的最佳声学系统的评分组合有显著影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Frame-based phonotactic Language Identification
This paper describes a frame-based phonotactic Language Identification (LID) system, which was used for the LID evaluation of the Robust Automatic Transcription of Speech (RATS) program by the Defense Advanced Research Projects Agency (DARPA). The proposed approach utilizes features derived from frame-level phone log-likelihoods from a phone recognizer. It is an attempt to capture not only phone sequence information but also short-term timing information for phone N-gram events, which is lacking in conventional phonotactic LID systems that simply count phone N-gram events. Based on this new method, we achieved 26% relative improvement in terms of Cavg for the RATS LID evaluation data compared to phone N-gram counts modeling. We also observed that it had a significant impact on score combination with our best acoustic system based on Mel-Frequency Cepstral Coefficients (MFCCs).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信