Open vocabulary spoken document retrieval by subword sequence obtained from speech recognizer

2008 IEEE Spoken Language Technology Workshop Pub Date : 2008-12-01 DOI:10.1109/SLT.2008.4777900

Go Kuriki, Y. Itoh, K. Kojima, M. Ishigame, Kazuyo Tanaka, Shi-wook Lee

引用次数: 0

Abstract

We present a method for open vocabulary retrieval based on a spoken document retrieval (SDR) system using subword models. The present paper proposes a new approach to open vocabulary SDR system using subword models which do not require subword recognition. Instead, subword sequences are obtained from the phone sequence outputted containing an out of vocabulary (OOV) word, a speech recognizer outputs a word sequence whose phone sequence is considered to be similar to the OOV word. When OOV words are provided in a query, the proposed system is able to retrieve the target section by comparing the phone sequences of the query and the word sequence generated by the speech recognizer.

查看原文本刊更多论文

利用语音识别器获得的子词序列进行开放词汇口语文档检索

提出了一种基于子词模型的开放式词汇检索方法。本文提出了一种不需要子词识别的子词模型实现开放词汇SDR系统的新方法。相反，从包含超出词汇表(OOV)单词的电话序列输出中获得子词序列，语音识别器输出一个单词序列，其电话序列被认为与OOV单词相似。当查询中提供OOV单词时，所提出的系统能够通过比较查询的电话序列和语音识别器生成的单词序列来检索目标部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE Spoken Language Technology Workshop

自引率

0.00%

发文量