技术观点:在并行dbms向外扩展时控制硬件倾斜

SIGMOD Rec. Pub Date : 2016-06-02 DOI:10.1145/2949741.2949751

D. DeWitt

{"title":"技术观点:在并行dbms向外扩展时控制硬件倾斜","authors":"D. DeWitt","doi":"10.1145/2949741.2949751","DOIUrl":null,"url":null,"abstract":"For almost 40 years now, relational database management systems have successfully used data parallelism to speed up the evaluation of large queries. Here, by “data parallelism” we mean taking one operation (for example, a “join” or an “aggregation”) and spreading it over multiple machines, each operating on a part of the data. In general this approach works spectacularly well, yielding almost linear speedups over a wide variety of workloads. However, like any form of parallelism, data-parallel relational query processing is vulnerable to “skew.” The database literature is full of work dealing with the skew that arises when one node in a parallel system is allocated more work than the average. The following paper, by Li, Naughton, and Nehme, is interesting in that it deals with another kind of skew, one that has received much less attention: “hardware skew,” that is, skew that arises because the processing units in a parallel system are not all of equal power. Such skew can arise in several ways – for example, a parallel system could be constructed “on the fly” by allocating available nodes in a cloud, or a company could upgrade an on-premises system with the addition of new nodes that are of a different generation and class of hardware than the existing ones. If the DBMS is oblivious to the fact that the underlying system is not uniform, the result will be the same as that achieved if the system were constructed entirely of the slowest nodes in the system. If all the nodes in the system are equally “balanced” the solution is simple – if one node is 1/2 as fast as the average, give that node 1/2 the average work, and you are set. Unfortunately, in practice, things are not that simple. One node may have a faster CPU but the same I/O performance, or vice-versa; or nodes may have differing amounts of memory or network bandwidth. In such cases simple proportional allocation of work will be suboptimal. The situation is further complicated by the fact that different queries make different demands on the system with respect to CPU, memory, network, and disk; in fact, different stages of a single query can make very different demands. This, finally, is the situation addressed by the paper, “Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters.” The authors make use of techniques for cost estimation growing out of the query optimization and query running time prediction literature; they combine these techniques with a linear programming model that chooses an optimal allocation for a given query on a given system. They demonstrate through an analytic model as well as experiments with an implementation that their proposed solution dominates simpler alternatives. An interesting question this work raises is the duality between “on-demand” load balancing of the type employed by MapReduce-like systems and the predictive, up-front allocation of work advocated by this paper. My suspicion is that both approaches have their place, and the choice of which to use depends on issues such as the predictability of the workload and the importance of “locality” in the performance of the system. Perhaps hybrid solutions will be the answer in some cases.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"94 1","pages":"41"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Technical Perspective: Taming Hardware Skew as Parallel DBMSs Scale Out\",\"authors\":\"D. DeWitt\",\"doi\":\"10.1145/2949741.2949751\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For almost 40 years now, relational database management systems have successfully used data parallelism to speed up the evaluation of large queries. Here, by “data parallelism” we mean taking one operation (for example, a “join” or an “aggregation”) and spreading it over multiple machines, each operating on a part of the data. In general this approach works spectacularly well, yielding almost linear speedups over a wide variety of workloads. However, like any form of parallelism, data-parallel relational query processing is vulnerable to “skew.” The database literature is full of work dealing with the skew that arises when one node in a parallel system is allocated more work than the average. The following paper, by Li, Naughton, and Nehme, is interesting in that it deals with another kind of skew, one that has received much less attention: “hardware skew,” that is, skew that arises because the processing units in a parallel system are not all of equal power. Such skew can arise in several ways – for example, a parallel system could be constructed “on the fly” by allocating available nodes in a cloud, or a company could upgrade an on-premises system with the addition of new nodes that are of a different generation and class of hardware than the existing ones. If the DBMS is oblivious to the fact that the underlying system is not uniform, the result will be the same as that achieved if the system were constructed entirely of the slowest nodes in the system. If all the nodes in the system are equally “balanced” the solution is simple – if one node is 1/2 as fast as the average, give that node 1/2 the average work, and you are set. Unfortunately, in practice, things are not that simple. One node may have a faster CPU but the same I/O performance, or vice-versa; or nodes may have differing amounts of memory or network bandwidth. In such cases simple proportional allocation of work will be suboptimal. The situation is further complicated by the fact that different queries make different demands on the system with respect to CPU, memory, network, and disk; in fact, different stages of a single query can make very different demands. This, finally, is the situation addressed by the paper, “Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters.” The authors make use of techniques for cost estimation growing out of the query optimization and query running time prediction literature; they combine these techniques with a linear programming model that chooses an optimal allocation for a given query on a given system. They demonstrate through an analytic model as well as experiments with an implementation that their proposed solution dominates simpler alternatives. An interesting question this work raises is the duality between “on-demand” load balancing of the type employed by MapReduce-like systems and the predictive, up-front allocation of work advocated by this paper. My suspicion is that both approaches have their place, and the choice of which to use depends on issues such as the predictability of the workload and the importance of “locality” in the performance of the system. Perhaps hybrid solutions will be the answer in some cases.\",\"PeriodicalId\":21740,\"journal\":{\"name\":\"SIGMOD Rec.\",\"volume\":\"94 1\",\"pages\":\"41\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIGMOD Rec.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2949741.2949751\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGMOD Rec.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949741.2949751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

近40年来，关系数据库管理系统已经成功地使用数据并行性来加速大型查询的计算。这里的“数据并行性”是指将一个操作(例如，“连接”或“聚合”)分散到多台机器上，每台机器对数据的一部分进行操作。一般来说，这种方法效果非常好，在各种工作负载上产生几乎线性的加速。然而，与任何形式的并行一样，数据并行关系查询处理也容易出现“倾斜”。数据库文献中充满了处理当并行系统中的一个节点被分配的工作多于平均值时出现的倾斜的工作。下面这篇由Li、Naughton和Nehme撰写的论文很有趣，因为它处理了另一种偏差，一种很少受到关注的偏差:“硬件偏差”，也就是说，由于并行系统中的处理单元并非都具有相同的能力而产生的偏差。这种偏差可以通过几种方式产生——例如，可以通过在云中分配可用节点来“动态地”构建并行系统，或者公司可以通过添加与现有系统不同一代和硬件类别的新节点来升级本地系统。如果DBMS忽略了底层系统不统一的事实，那么结果将与完全由系统中最慢的节点构建的系统所获得的结果相同。如果系统中的所有节点都同样“平衡”，那么解决方案很简单——如果一个节点的速度是平均速度的1/2，那么给该节点1/2的平均工作量，就可以了。不幸的是，在实践中，事情并没有那么简单。一个节点可能有更快的CPU，但I/O性能相同，反之亦然;或者节点可能具有不同数量的内存或网络带宽。在这种情况下，简单的按比例分配工作将是次优的。由于不同的查询对系统的CPU、内存、网络和磁盘有不同的要求，情况变得更加复杂;事实上，单个查询的不同阶段可能会产生非常不同的需求。最后，本文“异构集群上并行dbms的资源拼贴”解决了这种情况。作者利用了从查询优化和查询运行时间预测文献中发展出来的成本估计技术;他们将这些技术与线性规划模型结合起来，为给定系统上的给定查询选择最优分配。他们通过分析模型和实现实验证明，他们提出的解决方案优于更简单的替代方案。这项工作提出了一个有趣的问题，即mapreduce类系统所采用的“按需”负载平衡类型与本文所提倡的预测性、预先分配工作之间的二元性。我怀疑这两种方法都有各自的位置，选择使用哪一种取决于诸如工作量的可预测性和系统性能中“局部性”的重要性等问题。也许在某些情况下，混合解决方案将是答案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Technical Perspective: Taming Hardware Skew as Parallel DBMSs Scale Out

For almost 40 years now, relational database management systems have successfully used data parallelism to speed up the evaluation of large queries. Here, by “data parallelism” we mean taking one operation (for example, a “join” or an “aggregation”) and spreading it over multiple machines, each operating on a part of the data. In general this approach works spectacularly well, yielding almost linear speedups over a wide variety of workloads. However, like any form of parallelism, data-parallel relational query processing is vulnerable to “skew.” The database literature is full of work dealing with the skew that arises when one node in a parallel system is allocated more work than the average. The following paper, by Li, Naughton, and Nehme, is interesting in that it deals with another kind of skew, one that has received much less attention: “hardware skew,” that is, skew that arises because the processing units in a parallel system are not all of equal power. Such skew can arise in several ways – for example, a parallel system could be constructed “on the fly” by allocating available nodes in a cloud, or a company could upgrade an on-premises system with the addition of new nodes that are of a different generation and class of hardware than the existing ones. If the DBMS is oblivious to the fact that the underlying system is not uniform, the result will be the same as that achieved if the system were constructed entirely of the slowest nodes in the system. If all the nodes in the system are equally “balanced” the solution is simple – if one node is 1/2 as fast as the average, give that node 1/2 the average work, and you are set. Unfortunately, in practice, things are not that simple. One node may have a faster CPU but the same I/O performance, or vice-versa; or nodes may have differing amounts of memory or network bandwidth. In such cases simple proportional allocation of work will be suboptimal. The situation is further complicated by the fact that different queries make different demands on the system with respect to CPU, memory, network, and disk; in fact, different stages of a single query can make very different demands. This, finally, is the situation addressed by the paper, “Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters.” The authors make use of techniques for cost estimation growing out of the query optimization and query running time prediction literature; they combine these techniques with a linear programming model that chooses an optimal allocation for a given query on a given system. They demonstrate through an analytic model as well as experiments with an implementation that their proposed solution dominates simpler alternatives. An interesting question this work raises is the duality between “on-demand” load balancing of the type employed by MapReduce-like systems and the predictive, up-front allocation of work advocated by this paper. My suspicion is that both approaches have their place, and the choice of which to use depends on issues such as the predictability of the workload and the importance of “locality” in the performance of the system. Perhaps hybrid solutions will be the answer in some cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIGMOD Rec.

自引率

0.00%

发文量