Load balancing in pipelined processing of multi-join queries

Proceedings of 1994 International Conference on Parallel and Distributed Systems Pub Date : 1994-12-19 DOI:10.1109/ICPADS.1994.590427

Hongjun Lu, K. Tan, Chiang Lee

引用次数: 1

Abstract

Looks at how to effectively exploit pipelining for multi-join queries in shared-nothing systems. A multi-join query can be processed using an iterative approach. In each iteration, several relations are selected and are joined in a pipelined fashion. However, algorithms that are based on this approach have traditionally assumed that the relations are uniformly distributed or only slightly skewed. When this assumption is relaxed, i.e. when the data is skewed, some nodes may be assigned a larger amount of data than can fit into their memories. As such, pipelining cannot be effectively exploited, and performance may degenerate drastically. We propose four skew handling techniques to deal with data skew for multi-join queries. The results of a performance study show that a hybrid technique is superior in most cases.

查看原文本刊更多论文

多连接查询的流水线处理中的负载平衡

介绍如何在无共享系统中有效地利用流水线进行多连接查询。可以使用迭代方法处理多连接查询。在每次迭代中，选择几个关系并以流水线方式连接起来。然而，基于这种方法的算法传统上假设这些关系是均匀分布的，或者只是轻微倾斜的。当这个假设被放宽时，即当数据被扭曲时，一些节点可能被分配的数据量超过了它们的内存。因此，无法有效地利用流水线，性能可能会急剧下降。我们提出了四种倾斜处理技术来处理多连接查询的数据倾斜。性能研究的结果表明，混合技术在大多数情况下是优越的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of 1994 International Conference on Parallel and Distributed Systems

自引率

0.00%

发文量