Self-organizing tuple reconstruction in column-stores

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data Pub Date : 2009-06-29 DOI:10.1145/1559845.1559878

Stratos Idreos, M. Kersten, S. Manegold

{"title":"Self-organizing tuple reconstruction in column-stores","authors":"Stratos Idreos, M. Kersten, S. Manegold","doi":"10.1145/1559845.1559878","DOIUrl":null,"url":null,"abstract":"Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.","PeriodicalId":344093,"journal":{"name":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"184","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2009 ACM SIGMOD International Conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1559845.1559878","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 184

Abstract

Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.

查看原文本刊更多论文

列存储中的自组织元组重构

柱式存储作为一种有前途的物理设计替代方案而受到欢迎。关系的每个属性在物理上存储为单独的列，允许查询只加载所需的属性。所产生的开销是针对多属性查询的动态元组重构。每个元组重建都是基于元组id的两个列的连接，这使得它成为一个重要的成本组件。最终的物理设计是拥有每个基表的多个预排序副本，这样元组就已经在各个列上按照多个不同的顺序进行了适当的组织。这需要能够预测工作负载、空闲准备时间和不频繁的更新。在本文中，我们提出了一种新颖的设计，局部侧裂，以一种自组织的方式最小化元组重构成本。它实现了类似于使用预排序数据的性能，但不需要繁重的初始预排序步骤本身。相反，它处理动态的、不可预测的工作负载，没有空闲时间和频繁的更新。辅助的动态数据结构，称为破解映射，提供了在元组重构查询中一起使用的属性对之间的直接映射。作为查询求值的一个组成部分，映射会不断地进行物理重组，从而为未来的查询提供更快、更少的数据访问。为了在存储有限的环境中支持灵活和自组织的行为，映射仅根据工作负载的要求部分具体化。每个映射都是独立块的集合，这些块可以根据需要单独重组、删除或重新创建。我们在一个开源的列存储中实现了部分横向破解。详细的实验分析表明，它为多属性查询带来了显著的性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

自引率

0.00%

发文量