Load balancing in pipelined processing of multi-join queries

Hongjun Lu, K. Tan, Chiang Lee
{"title":"Load balancing in pipelined processing of multi-join queries","authors":"Hongjun Lu, K. Tan, Chiang Lee","doi":"10.1109/ICPADS.1994.590427","DOIUrl":null,"url":null,"abstract":"Looks at how to effectively exploit pipelining for multi-join queries in shared-nothing systems. A multi-join query can be processed using an iterative approach. In each iteration, several relations are selected and are joined in a pipelined fashion. However, algorithms that are based on this approach have traditionally assumed that the relations are uniformly distributed or only slightly skewed. When this assumption is relaxed, i.e. when the data is skewed, some nodes may be assigned a larger amount of data than can fit into their memories. As such, pipelining cannot be effectively exploited, and performance may degenerate drastically. We propose four skew handling techniques to deal with data skew for multi-join queries. The results of a performance study show that a hybrid technique is superior in most cases.","PeriodicalId":154429,"journal":{"name":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of 1994 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.1994.590427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Looks at how to effectively exploit pipelining for multi-join queries in shared-nothing systems. A multi-join query can be processed using an iterative approach. In each iteration, several relations are selected and are joined in a pipelined fashion. However, algorithms that are based on this approach have traditionally assumed that the relations are uniformly distributed or only slightly skewed. When this assumption is relaxed, i.e. when the data is skewed, some nodes may be assigned a larger amount of data than can fit into their memories. As such, pipelining cannot be effectively exploited, and performance may degenerate drastically. We propose four skew handling techniques to deal with data skew for multi-join queries. The results of a performance study show that a hybrid technique is superior in most cases.
多连接查询的流水线处理中的负载平衡
介绍如何在无共享系统中有效地利用流水线进行多连接查询。可以使用迭代方法处理多连接查询。在每次迭代中,选择几个关系并以流水线方式连接起来。然而,基于这种方法的算法传统上假设这些关系是均匀分布的,或者只是轻微倾斜的。当这个假设被放宽时,即当数据被扭曲时,一些节点可能被分配的数据量超过了它们的内存。因此,无法有效地利用流水线,性能可能会急剧下降。我们提出了四种倾斜处理技术来处理多连接查询的数据倾斜。性能研究的结果表明,混合技术在大多数情况下是优越的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信