Comparative performance of parallel join algorithms

[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems Pub Date : 1991-12-01 DOI:10.1109/PDIS.1991.183070

J. Wolf, D. Dias, Philip S. Yu, John Turek

引用次数: 17

Abstract

The authors recently (1990, 1991) described two new join algorithms designed to address the data skew problem. These algorithms were based, respectively, on the traditional sort merge and hash join algorithms, and employed techniques borrowed from mathematical optimization theory. The current paper proposes significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. It then focuses on the comparative performance of the improved algorithms and their more conventional sort merge and hash counterparts. The latter two are perfectly good algorithms except that they fail to deal with data skew. Both I/O- and CPU-bound configurations were examined. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew.<>

查看原文本刊更多论文

并行连接算法的性能比较

作者最近(1990,1991)描述了两种新的连接算法，旨在解决数据倾斜问题。这些算法分别基于传统的排序合并和哈希连接算法，并采用了借鉴数学优化理论的技术。本文对这两种算法提出了重大改进，提高了它们的有效性，同时减少了它们的执行时间。然后将重点放在改进算法与更传统的排序合并和散列对应算法的比较性能上。后两种算法除了不能处理数据倾斜之外，都是非常好的算法。检查了I/O和cpu绑定的配置。新算法在几乎任何歪斜的情况下都优于传统算法，在高歪斜的情况下更是如此

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

[1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems

自引率

0.00%

发文量