DHTs over Peer Clusters for Distributed Information Retrieval

21st International Conference on Advanced Information Networking and Applications (AINA '07) Pub Date : 2007-05-21 DOI:10.1109/AINA.2007.60

Odysseas Papapetrou, W. Siberski, Wolf-Tilo Balke, W. Nejdl

引用次数: 14

Abstract

Distributed hash tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, building huge term indexes, as required for IR-style keyword search, are impractical with plain DHTs. Due to the large sizes of document term vocabularies, joining peers cause huge amounts of key inserts, and subsequently large numbers of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance. We show that this can be achieved by combining DHTs with peer clustering. Peers are first clustered into communities, each of the communities having a representative super-peer. Then all occurrences of a term in a community are published to the global DHT in a batch by the representative super-peer. Our evaluation shows that this reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.

查看原文本刊更多论文

基于对等集群的分布式信息检索dht

分布式哈希表(dht)对于基于键查找的查询非常有效，如果每个单独的对等体只需要注册少量的键。然而，对于普通dht来说，构建ir风格关键字搜索所需的大型术语索引是不切实际的。由于文档术语词汇表的大小很大，连接对等节点会导致大量的键插入，以及随后的大量索引维护消息。因此，利用dht进行分布式信息检索的关键是减少索引维护。我们表明，这可以通过将dht与对等集群相结合来实现。节点首先聚集到社区中，每个社区都有一个具有代表性的超级节点。然后，社区中出现的所有术语都由具有代表性的超级对等体批量发布到全局DHT。我们的评估表明，这将索引维护成本降低了一个数量级，同时仍然为查询处理保持完整和正确的术语索引。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

21st International Conference on Advanced Information Networking and Applications (AINA '07)

自引率

0.00%

发文量