Word-lattice based spoken-document indexing with standard text indexers

2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI:10.1109/SLT.2008.4777898

F. Seide, K. Thambiratnam, Roger Peng Yu

引用次数: 12

Abstract

Indexing the spoken content of audio recordings requires automatic speech recognition, which is as of today not reliable. Unlike indexing text, we cannot reliably know from a speech recognizer whether a word is present at a given point in the audio; we can only obtain a probability for it. Correct use of these probabilities significantly improves spoken-document search accuracy. In this paper, we will first describe how to improve accuracy for "web-search style" (AND/phrase) queries into audio, by utilizing speech recognition alternates and word posterior probabilities based on word lattices. Then, we will present an end-to-end approach to doing so using standard text indexers, which by design cannot handle probabilities and unaligned alternates. We present a sequence of approximations that transform the numeric lattice-matching problem into a symbolic text-based one that can be implemented by a commercial full-text indexer. Experiments on a 170-hour lecture set show an accuracy improvement by 30-60% for phrase searches and by 130% for two-term AND queries, compared to indexing linear text.

查看原文本刊更多论文

使用标准文本索引器进行基于词格的口语文档索引

索引录音的语音内容需要自动语音识别，这在今天是不可靠的。与索引文本不同，我们无法从语音识别器中可靠地知道一个单词是否出现在音频的给定点上;我们只能得到它的概率。正确使用这些概率可以显著提高口语文档搜索的准确性。在本文中，我们将首先描述如何利用基于词格的语音识别交替和词后验概率来提高对音频的“网络搜索风格”(AND/短语)查询的准确性。然后，我们将提出一种使用标准文本索引器的端到端方法，该方法在设计上不能处理概率和未对齐的替换。我们提出了一系列近似值，将数值格匹配问题转换为可由商业全文索引器实现的基于符号文本的问题。在170小时的演讲集上进行的实验表明，与索引线性文本相比，短语搜索的准确率提高了30-60%，两项and查询的准确率提高了130%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE Spoken Language Technology Workshop

自引率

0.00%

发文量