Vertical partitioning of relational OLTP databases using integer programming

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2009-11-09 DOI:10.1109/ICDEW.2010.5452739

Rasmus Resen Amossen

{"title":"Vertical partitioning of relational OLTP databases using integer programming","authors":"Rasmus Resen Amossen","doi":"10.1109/ICDEW.2010.5452739","DOIUrl":null,"url":null,"abstract":"A way to optimize performance of relational row store databases is to reduce the row widths by vertically partitioning tables into table fractions in order to minimize the number of irrelevant columns/attributes read by each transaction. This paper considers vertical partitioning algorithms for relational row-store OLTP databases with an H-store-like architecture, meaning that we would like to maximize the number of single-sited transactions. We present a model for the vertical partitioning problem that, given a schema together with a vertical partitioning and a workload, estimates the costs (bytes read/written by storage layer access methods and bytes transferred between sites) of evaluating the workload on the given partitioning. The cost model allows for arbitrarily prioritizing load balancing of sites vs. total cost minimization. We show that finding a minimum-cost vertical partitioning in this model is NP-hard and therefore the problem should obviously not be solved manually by a human DBA. We present two algorithms returning solutions in which single-sitedness of read queries is preserved while allowing column replication (which may allow a drastically reduced cost compared to disjoint partitioning). The first algorithm is a quadratic integer program that finds optimal minimum-cost solutions with respect to the model, and the second algorithm is a more scalable heuristic based on simulated annealing. Experiments show that the algorithms can reduce the cost of the model objective by 37% when applied to the TPC-C benchmark and the heuristic is shown to obtain solutions with costs close to the ones found using the quadratic program.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2010.5452739","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

A way to optimize performance of relational row store databases is to reduce the row widths by vertically partitioning tables into table fractions in order to minimize the number of irrelevant columns/attributes read by each transaction. This paper considers vertical partitioning algorithms for relational row-store OLTP databases with an H-store-like architecture, meaning that we would like to maximize the number of single-sited transactions. We present a model for the vertical partitioning problem that, given a schema together with a vertical partitioning and a workload, estimates the costs (bytes read/written by storage layer access methods and bytes transferred between sites) of evaluating the workload on the given partitioning. The cost model allows for arbitrarily prioritizing load balancing of sites vs. total cost minimization. We show that finding a minimum-cost vertical partitioning in this model is NP-hard and therefore the problem should obviously not be solved manually by a human DBA. We present two algorithms returning solutions in which single-sitedness of read queries is preserved while allowing column replication (which may allow a drastically reduced cost compared to disjoint partitioning). The first algorithm is a quadratic integer program that finds optimal minimum-cost solutions with respect to the model, and the second algorithm is a more scalable heuristic based on simulated annealing. Experiments show that the algorithms can reduce the cost of the model objective by 37% when applied to the TPC-C benchmark and the heuristic is shown to obtain solutions with costs close to the ones found using the quadratic program.

查看原文本刊更多论文

使用整数规划的关系OLTP数据库的垂直分区

优化关系行存储数据库性能的一种方法是通过将表垂直划分为表部分来减小行宽度，以便最小化每个事务读取的不相关列/属性的数量。本文考虑了具有类似h -store架构的关系型行存储OLTP数据库的垂直分区算法，这意味着我们希望最大化单站点事务的数量。我们提出了一个垂直分区问题的模型，在给定一个模式、一个垂直分区和一个工作负载的情况下，估计在给定分区上评估工作负载的成本(存储层访问方法读/写的字节数和站点之间传输的字节数)。成本模型允许对站点的负载平衡和总成本最小化进行任意优先级排序。我们表明，在这个模型中找到成本最低的垂直分区是np困难的，因此这个问题显然不应该由人类DBA手动解决。我们提出了两种返回解决方案的算法，其中在允许列复制的同时保留了读查询的单站点性(与不连接分区相比，这可能会大大降低成本)。第一种算法是一个二次整数程序，它可以找到关于模型的最优最小代价解，第二种算法是基于模拟退火的更具可扩展性的启发式算法。实验表明，将该算法应用于TPC-C基准测试时，可将模型目标的代价降低37%，并显示出启发式的求解方法，其代价接近使用二次规划得到的解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

自引率

0.00%

发文量