Elastic Pipelining in an In-Memory Database Cluster

Proceedings of the 2016 International Conference on Management of Data Pub Date : 2016-06-14 DOI:10.1145/2882903.2882904

Li Wang, Minqi Zhou, Zhenjie Zhang, Y. Yang, Aoying Zhou, D. Bitton

{"title":"Elastic Pipelining in an In-Memory Database Cluster","authors":"Li Wang, Minqi Zhou, Zhenjie Zhang, Y. Yang, Aoying Zhou, D. Bitton","doi":"10.1145/2882903.2882904","DOIUrl":null,"url":null,"abstract":"An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.","PeriodicalId":20483,"journal":{"name":"Proceedings of the 2016 International Conference on Management of Data","volume":"134 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2016-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2016 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2882903.2882904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

An in-memory database cluster consists of multiple interconnected nodes with a large capacity of RAM and modern multi-core CPUs. As a conventional query processing strategy, pipelining remains a promising solution for in-memory parallel database systems, as it avoids expensive intermediate result materialization and parallelizes the data processing among nodes. However, to fully unleash the power of pipelining in a cluster with multi-core nodes, it is crucial for the query optimizer to generate good query plans with appropriate intra-node parallelism, in order to maximize CPU and network bandwidth utilization. A suboptimal plan, on the contrary, causes load imbalance in the pipelines and consequently degrades the query performance. Parallelism assignment optimization at compile time is nearly impossible, as the workload in each node is affected by numerous factors and is highly dynamic during query evaluation. To tackle this problem, we propose elastic pipelining, which makes it possible to optimize intra-node parallelism assignments in the pipelines based on the actual workload at runtime. It is achieved with the adoption of new elastic iterator model and a fully optimized dynamic scheduler. The elastic iterator model generally upgrades traditional iterator model with new dynamic multi-core execution adjustment capability. And the dynamic scheduler efficiently provisions CPU cores to query execution segments in the pipelines based on the light-weight measurements on the operators. Extensive experiments on real and synthetic (TPC-H) data show that our proposal achieves almost full CPU utilization on typical decision-making analytical queries, outperforming state-of-the-art open-source systems by a huge margin.

查看原文本刊更多论文

内存数据库集群中的弹性管道

内存数据库集群由多个相互连接的节点组成，这些节点具有大容量的RAM和现代多核cpu。作为一种传统的查询处理策略，流水线仍然是内存中并行数据库系统的一种很有前途的解决方案，因为它避免了昂贵的中间结果物化，并使节点之间的数据处理并行化。然而，为了在具有多核节点的集群中充分发挥流水线的功能，查询优化器必须生成具有适当节点内并行性的良好查询计划，以最大限度地提高CPU和网络带宽利用率。相反，次优计划会导致管道中的负载不平衡，从而降低查询性能。编译时的并行分配优化几乎是不可能的，因为每个节点中的工作负载受到许多因素的影响，并且在查询求值期间是高度动态的。为了解决这个问题，我们提出了弹性管道，这使得在运行时根据实际工作负载优化管道中的节点内并行分配成为可能。通过采用新的弹性迭代器模型和完全优化的动态调度来实现。弹性迭代器模型一般是对传统迭代器模型的升级，具有新的动态多核执行调整能力。动态调度器基于对操作符的轻量级度量，有效地分配CPU内核来查询管道中的执行段。在真实和合成(TPC-H)数据上进行的大量实验表明，我们的建议在典型的决策分析查询上实现了几乎全部的CPU利用率，大大优于最先进的开源系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2016 International Conference on Management of Data

自引率

0.00%

发文量