{"title":"Compression of Inverted Index for Comprehensive Performance Evaluation in Lucene","authors":"Xianghua Xu, Shengyi Pan, Jian Wan","doi":"10.1109/CSO.2010.126","DOIUrl":null,"url":null,"abstract":"Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of if-then-else construction of decompression process on performance in Java environment; 2) the algorithm’s compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.","PeriodicalId":427481,"journal":{"name":"2010 Third International Joint Conference on Computational Science and Optimization","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Joint Conference on Computational Science and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSO.2010.126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Inverted index is the most popular index structure in search engine. Applying index compression can reduce storage space on inverted index, and improve the search performance. In this paper, we achieve comprehensive performance evaluation of three state-of-the-art index compression schemes on open source information retrieval system—Lucene. We focus on the compression and storage of document ID, frequency and position information of Lucene word-level inverted index. The main work includes: 1) the impact of if-then-else construction of decompression process on performance in Java environment; 2) the algorithm’s compression ratio on the different scale of data; 3) the performance comparison of term and phrase search; 4) whether interleaving index file has remarkable discrepancies in compression ratio and decompression speed. The experiment result and analysis is given in detail.