Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene

Xianghua Xu, Shengyi Pan, Jian Wan
{"title":"Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene","authors":"Xianghua Xu, Shengyi Pan, Jian Wan","doi":"10.1109/CSO.2010.126","DOIUrl":null,"url":null,"abstract":"Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of if-then-else construction of decompression process on performance in Java environment; 2) the algorithm’s compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.","PeriodicalId":427481,"journal":{"name":"2010 Third International Joint Conference on Computational Science and Optimization","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Joint Conference on Computational Science and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSO.2010.126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of if-then-else construction of decompression process on performance in Java environment; 2) the algorithm’s compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.
Lucene中综合性能评价倒排指标的压缩
倒排索引是搜索引擎中最常用的索引结构。应用索引压缩可以减少倒排索引的存储空间,提高倒排索引的搜索性能。本文在开源信息检索系统lucene上对三种最先进的索引压缩方案进行了综合性能评价。重点研究Lucene字级倒排索引的文档ID、频次和位置信息的压缩和存储。主要工作包括:1)在Java环境下解压缩过程的if-then-else构造对性能的影响;2)算法对不同规模数据的压缩比;3)词与短语搜索的性能比较;4)交错索引文件在压缩比、解压速度上是否存在显著差异。给出了详细的实验结果和分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信