Partitioned Bit-Packed Vectors for In-Memory-Column-Stores

Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics Pub Date : 2015-08-31 DOI:10.1145/2803140.2803142

Martin Faust, Pedro Flemming, David Schwalb, H. Plattner

{"title":"Partitioned Bit-Packed Vectors for In-Memory-Column-Stores","authors":"Martin Faust, Pedro Flemming, David Schwalb, H. Plattner","doi":"10.1145/2803140.2803142","DOIUrl":null,"url":null,"abstract":"In recent database development, in-memory databases have grown more and more in popularity. The hardware development of the past years has made it possible to keep even larger data sets entirely in main memory of one or a few machines. However, most applications on in-memory databases are memory-latency-bound rather than compute-bound. Combining strong compression techniques and efficient data structures is essential to fully utilize the hardware capabilities. A common data structure for efficient storing is the bit-packed vector. The bit-packed vector uses a fixed encoding length, which cannot be changed after initialization. Therefore it requires full re-initialization, when the encoding-length changes. In this paper we propose a new data structure, the partitioned bit-packed vector. Therein the encoding length of the stored elements may increase dynamically, while still providing comparable single-value access performance. This paper outlines the access to this data structure and evaluates its performance characteristics. The results suggest that the partitioned bitvector has the capabilities to improve the performance of existing in-memory column-stores for typical enterprise workloads.","PeriodicalId":175654,"journal":{"name":"Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics","volume":"98 4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2803140.2803142","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In recent database development, in-memory databases have grown more and more in popularity. The hardware development of the past years has made it possible to keep even larger data sets entirely in main memory of one or a few machines. However, most applications on in-memory databases are memory-latency-bound rather than compute-bound. Combining strong compression techniques and efficient data structures is essential to fully utilize the hardware capabilities. A common data structure for efficient storing is the bit-packed vector. The bit-packed vector uses a fixed encoding length, which cannot be changed after initialization. Therefore it requires full re-initialization, when the encoding-length changes. In this paper we propose a new data structure, the partitioned bit-packed vector. Therein the encoding length of the stored elements may increase dynamically, while still providing comparable single-value access performance. This paper outlines the access to this data structure and evaluates its performance characteristics. The results suggest that the partitioned bitvector has the capabilities to improve the performance of existing in-memory column-stores for typical enterprise workloads.

查看原文本刊更多论文

用于内存列存储的分区位打包向量

在最近的数据库开发中，内存数据库越来越受欢迎。过去几年的硬件发展已经使得将更大的数据集完全保存在一台或几台机器的主存储器中成为可能。但是，内存数据库上的大多数应用程序是内存延迟绑定的，而不是计算绑定的。结合强大的压缩技术和高效的数据结构是充分利用硬件功能的必要条件。有效存储的常用数据结构是位打包向量。位打包向量使用固定的编码长度，初始化后不能更改。因此，当编码长度改变时，需要完全重新初始化。在本文中，我们提出了一种新的数据结构，即分区位包向量。其中，所存储元素的编码长度可以动态地增加，同时仍然提供可比较的单值访问性能。本文概述了对该数据结构的访问，并对其性能特征进行了评估。结果表明，对于典型的企业工作负载，分区的位向量能够提高现有内存中列存储的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 3rd VLDB Workshop on In-Memory Data Mangement and Analytics

自引率

0.00%

发文量