面向关系连接的倾斜不敏感并行算法

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238) Pub Date : 1998-12-17 DOI:10.1109/HIPC.1998.738010

K. Alsabti, S. Ranka

{"title":"面向关系连接的倾斜不敏感并行算法","authors":"K. Alsabti, S. Ranka","doi":"10.1109/HIPC.1998.738010","DOIUrl":null,"url":null,"abstract":"Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper we present two new parallel join algorithms for coarse grained machines which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Skew-insensitive parallel algorithms for relational join\",\"authors\":\"K. Alsabti, S. Ranka\",\"doi\":\"10.1109/HIPC.1998.738010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper we present two new parallel join algorithms for coarse grained machines which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable.\",\"PeriodicalId\":175528,\"journal\":{\"name\":\"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HIPC.1998.738010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIPC.1998.738010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

Join是关系数据库中最重要也是最昂贵的操作。并行连接操作对数据倾斜的存在非常敏感。本文提出了两种新的粗粒度机器并行连接算法，这两种算法在存在任意数量的数据倾斜的情况下工作最佳。第一种算法是基于排序的，第二种算法是基于哈希的。这两种算法都采用预处理阶段来在处理器之间平均分配工作。这些算法在理论上和实践上都具有可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Skew-insensitive parallel algorithms for relational join

Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper we present two new parallel join algorithms for coarse grained machines which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)

自引率

0.00%

发文量