Dynamic bitmap index recompression through workload-based optimizations

Proceedings. International Database Engineering and Applications Symposium Pub Date : 2013-10-09 DOI:10.1145/2513591.2513641

Fredton Doan, David Chiu, Brasil Perez Lukes, Jason Sawin, Gheorghi Guzun, G. Canahuate

{"title":"Dynamic bitmap index recompression through workload-based optimizations","authors":"Fredton Doan, David Chiu, Brasil Perez Lukes, Jason Sawin, Gheorghi Guzun, G. Canahuate","doi":"10.1145/2513591.2513641","DOIUrl":null,"url":null,"abstract":"Many large-scale read-only databases and data warehouses use bitmap indices in an effort to speed up data analysis. These indices have the dual properties of compressibility and being able to leverage fast bit-wise operations for query processing. Numerous hybrid run-length encoding compression schemes have been proposed that greatly compress the index and enable querying without the need to decompress. Typically, these schemes align their compression with the computer architecture's word size to further accelerate queries.\n Previously, we introduced Variable Length Compression (VLC), which uses a general encoding that can achieve better compression than word-aligned schemes. However, VLC's querying efficiency can vary widely due to mismatched alignment of compressed columns. In this paper, we present an optimizer which recompresses the bitmap over time. Based on query history, our approach allows the VLC user to specify the priority of compression versus query efficiency, then possibly recompress the bitmap accordingly. In an empirical study using scientific data sets, we showed that our approach was able to achieve both better compression ratios and query speedup over WAH and PLWAH. On the largest data set, our VLC optimizer compressed up to 1.73x better than WAH, and 1.46x over PLWAH. We also show a slight improvement in query efficiency in most experiments, while observing lucrative (11x to 16x) speedup in special cases.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"47 1","pages":"96-105"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Database Engineering and Applications Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2513591.2513641","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Many large-scale read-only databases and data warehouses use bitmap indices in an effort to speed up data analysis. These indices have the dual properties of compressibility and being able to leverage fast bit-wise operations for query processing. Numerous hybrid run-length encoding compression schemes have been proposed that greatly compress the index and enable querying without the need to decompress. Typically, these schemes align their compression with the computer architecture's word size to further accelerate queries. Previously, we introduced Variable Length Compression (VLC), which uses a general encoding that can achieve better compression than word-aligned schemes. However, VLC's querying efficiency can vary widely due to mismatched alignment of compressed columns. In this paper, we present an optimizer which recompresses the bitmap over time. Based on query history, our approach allows the VLC user to specify the priority of compression versus query efficiency, then possibly recompress the bitmap accordingly. In an empirical study using scientific data sets, we showed that our approach was able to achieve both better compression ratios and query speedup over WAH and PLWAH. On the largest data set, our VLC optimizer compressed up to 1.73x better than WAH, and 1.46x over PLWAH. We also show a slight improvement in query efficiency in most experiments, while observing lucrative (11x to 16x) speedup in special cases.

查看原文本刊更多论文

通过基于工作负载的优化进行动态位图索引再压缩

许多大型只读数据库和数据仓库使用位图索引来加快数据分析。这些索引具有可压缩性和能够利用快速逐位操作进行查询处理的双重属性。已经提出了许多混合游程长度编码压缩方案，这些方案大大压缩了索引，并且无需解压缩即可进行查询。通常，这些方案将其压缩与计算机体系结构的单词大小对齐，以进一步加速查询。在前面，我们介绍了可变长度压缩(VLC)，它使用一种通用编码，可以实现比字对齐方案更好的压缩。然而，由于压缩列的不匹配对齐，VLC的查询效率可能会有很大差异。在本文中，我们提出了一个优化器，它可以随着时间的推移重新压缩位图。基于查询历史，我们的方法允许VLC用户指定压缩与查询效率的优先级，然后可能相应地重新压缩位图。在使用科学数据集的实证研究中，我们表明我们的方法能够实现比WAH和PLWAH更好的压缩比和查询加速。在最大的数据集上，我们的VLC优化器比WAH压缩了1.73倍，比PLWAH压缩了1.46倍。我们还在大多数实验中显示了查询效率的轻微提高，同时在特殊情况下观察到可观的(11到16倍)加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Database Engineering and Applications Symposium

自引率

0.00%

发文量