Needles and Haystacks: a search engine for personal information collections

Proceedings 23rd Australasian Computer Science Conference. ACSC 2000 (Cat. No.PR00518) Pub Date : 2000-01-31 DOI:10.1109/ACSC.2000.824381

Owen de Kretser, Alistair Moffat

引用次数: 6

Abstract

Information retrieval systems can be partitioned into two main classes: large-scale systems that make use of an inverted index or some other auxiliary data structure, intended for massive volumes of data; and the small-scale systems based upon sequential pattern matching that most computer users employ when hunting for missing email and news items. In this paper we describe a hybrid approach that offers the ranked queries and similarity matching of a genuine information retrieval system, but does so without any need for an index to be precomputed. This software tool, which we call seft, offers performance that in a retrieval effectiveness sense matches conventional information retrieval systems, and in a resource efficiency sense, while considerably slower than grep-like tools, is fast enough to be useful on hundreds of megabytes of text.

查看原文本刊更多论文

针和干草堆:一个收集个人信息的搜索引擎

信息检索系统可分为两大类:大型系统，利用倒排索引或其他辅助数据结构，用于海量数据;大多数计算机用户在寻找丢失的电子邮件和新闻时使用的基于顺序模式匹配的小规模系统。在本文中，我们描述了一种混合方法，它提供了一个真正的信息检索系统的排名查询和相似度匹配，但不需要预先计算索引。这个软件工具(我们称之为seft)提供的性能在检索效率方面与传统的信息检索系统相匹配，而在资源效率方面，虽然比类似grep的工具慢得多，但速度足够快，可以在数百兆字节的文本上使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 23rd Australasian Computer Science Conference. ACSC 2000 (Cat. No.PR00518)

自引率

0.00%

发文量