Sub-word modeling of out of vocabulary words in spoken term detection

2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI:10.1109/SLT.2008.4777893

Igor Szöke, L. Burget, J. Černocký, M. Fapšo

引用次数: 73

Abstract

This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were proposed. We evaluated on the NIST STD06 dev-set CTS data. The conclusion is that the proposed method improves the phone accuracy more than 9% relative and STD accuracy more than 7% relative.

查看原文本刊更多论文

口语词汇检测中非词汇的子词建模

本文对基于子词的语音词检测方法和基于子词的语音词识别方法进行了比较。子词单位用于搜索超出词汇表的单词。我们比较了单词、电话和复合图。首先研究了复合图的最大长度和剪枝问题。然后提出了两种约束的多图训练方法。我们在NIST STD06开发集CTS数据上进行了评估。结果表明，该方法可使电话精度相对提高9%以上，STD精度相对提高7%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE Spoken Language Technology Workshop

自引率

0.00%

发文量