Index structures for efficiently searching natural language text

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI:10.1145/1871437.1871527

P. Chubak, Davood Rafiei

引用次数: 6

Abstract

Many existing indexes on text work at the document granularity and are not effective in answering the class of queries where the desired answer is only a term or a phrase. In this paper, we study some of the index structures that are capable of answering the class of queries referred to here as wild card queries and perform an analysis of their performance. Our experimental results on a large class of queries from different sources (including query logs and parse trees) and with various datasets reveal some of the performance barriers of these indexes. We then present Word Permuterm Index (WPI) which is an adaptation of the permuterm index for natural language text applications and show that this index supports a wide range of wild card queries, is quick to construct and is highly scalable. Our experimental resultS comparing WPI to alternative methods on a wide range oF wild card queries show a few orders of magnitude performancE improvements for WPI while the memory usage is kept the same for all compared systems.

查看原文本刊更多论文

用于有效搜索自然语言文本的索引结构

许多现有的文本索引在文档粒度上工作，并且不能有效地回答所需答案只是一个术语或短语的查询类。在本文中，我们研究了一些能够回答这里称为通配符查询的查询类的索引结构，并对其性能进行了分析。我们对来自不同来源(包括查询日志和解析树)和不同数据集的大量查询的实验结果揭示了这些索引的一些性能障碍。然后，我们介绍了Word永久索引(WPI)，它是自然语言文本应用程序的永久索引的一种改编，并表明该索引支持广泛的通配符查询，快速构建并且具有高度可扩展性。我们的实验结果将WPI与其他方法在广泛的通配符查询上进行比较，显示WPI的性能提高了几个数量级，而所有比较系统的内存使用保持不变。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th ACM international conference on Information and knowledge management

自引率

0.00%

发文量