DHTs over Peer Clusters for Distributed Information Retrieval

Odysseas Papapetrou, W. Siberski, Wolf-Tilo Balke, W. Nejdl
{"title":"DHTs over Peer Clusters for Distributed Information Retrieval","authors":"Odysseas Papapetrou, W. Siberski, Wolf-Tilo Balke, W. Nejdl","doi":"10.1109/AINA.2007.60","DOIUrl":null,"url":null,"abstract":"Distributed hash tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, building huge term indexes, as required for IR-style keyword search, are impractical with plain DHTs. Due to the large sizes of document term vocabularies, joining peers cause huge amounts of key inserts, and subsequently large numbers of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance. We show that this can be achieved by combining DHTs with peer clustering. Peers are first clustered into communities, each of the communities having a representative super-peer. Then all occurrences of a term in a community are published to the global DHT in a batch by the representative super-peer. Our evaluation shows that this reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.","PeriodicalId":361109,"journal":{"name":"21st International Conference on Advanced Information Networking and Applications (AINA '07)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Advanced Information Networking and Applications (AINA '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2007.60","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

Distributed hash tables (DHTs) are very efficient for querying based on key lookups, if only a small number of keys has to be registered by each individual peer. However, building huge term indexes, as required for IR-style keyword search, are impractical with plain DHTs. Due to the large sizes of document term vocabularies, joining peers cause huge amounts of key inserts, and subsequently large numbers of index maintenance messages. Thus, the key to exploiting DHTs for distributed information retrieval is to reduce index maintenance. We show that this can be achieved by combining DHTs with peer clustering. Peers are first clustered into communities, each of the communities having a representative super-peer. Then all occurrences of a term in a community are published to the global DHT in a batch by the representative super-peer. Our evaluation shows that this reduces index maintenance cost by an order of magnitude, while still keeping a complete and correct term index for query processing.
基于对等集群的分布式信息检索dht
分布式哈希表(dht)对于基于键查找的查询非常有效,如果每个单独的对等体只需要注册少量的键。然而,对于普通dht来说,构建ir风格关键字搜索所需的大型术语索引是不切实际的。由于文档术语词汇表的大小很大,连接对等节点会导致大量的键插入,以及随后的大量索引维护消息。因此,利用dht进行分布式信息检索的关键是减少索引维护。我们表明,这可以通过将dht与对等集群相结合来实现。节点首先聚集到社区中,每个社区都有一个具有代表性的超级节点。然后,社区中出现的所有术语都由具有代表性的超级对等体批量发布到全局DHT。我们的评估表明,这将索引维护成本降低了一个数量级,同时仍然为查询处理保持完整和正确的术语索引。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信