Comparison between document-based, term-based and hybrid partitioning

A. Abusukhon, M. Oakes, M. Talib, A. M. Abdalla
{"title":"Comparison between document-based, term-based and hybrid partitioning","authors":"A. Abusukhon, M. Oakes, M. Talib, A. M. Abdalla","doi":"10.1109/ICADIWT.2008.4664324","DOIUrl":null,"url":null,"abstract":"Information retrieval (IR) systems for largescale data collections must build an index in order to provide efficient retrieval that meets the userpsilas needs. In distributed IR systems, query response time is affected by the way in which the data collection is partitioned across nodes. There are three types of collection partitioning; document-based partitioning (called the local index), term-based partitioning (called the global index) and hybrid partitioning. In this paper, we compare the three types of partitioning in terms of average query response time for a system with one broker and six other nodes. Our results showed that within our distributed IR system, the document-based and hybrid partitioning outperformed the term-based partitioning. However, unlike Xi et al. , we did not find that hybrid partitioning was any better than document-based partitioning in terms of average query response time.","PeriodicalId":189871,"journal":{"name":"2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICADIWT.2008.4664324","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Information retrieval (IR) systems for largescale data collections must build an index in order to provide efficient retrieval that meets the userpsilas needs. In distributed IR systems, query response time is affected by the way in which the data collection is partitioned across nodes. There are three types of collection partitioning; document-based partitioning (called the local index), term-based partitioning (called the global index) and hybrid partitioning. In this paper, we compare the three types of partitioning in terms of average query response time for a system with one broker and six other nodes. Our results showed that within our distributed IR system, the document-based and hybrid partitioning outperformed the term-based partitioning. However, unlike Xi et al. , we did not find that hybrid partitioning was any better than document-based partitioning in terms of average query response time.
基于文档、基于术语和混合分区之间的比较
用于大规模数据收集的信息检索系统必须建立索引,以便提供满足用户需求的高效检索。在分布式IR系统中,查询响应时间受到数据收集跨节点分区方式的影响。有三种类型的集合分区;基于文档的分区(称为本地索引)、基于术语的分区(称为全局索引)和混合分区。在本文中,我们根据具有一个代理和六个其他节点的系统的平均查询响应时间来比较三种类型的分区。我们的结果表明,在我们的分布式IR系统中,基于文档和混合的分区优于基于术语的分区。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信