{"title":"面向关系连接的倾斜不敏感并行算法","authors":"K. Alsabti, S. Ranka","doi":"10.1109/HIPC.1998.738010","DOIUrl":null,"url":null,"abstract":"Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper we present two new parallel join algorithms for coarse grained machines which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable.","PeriodicalId":175528,"journal":{"name":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Skew-insensitive parallel algorithms for relational join\",\"authors\":\"K. Alsabti, S. Ranka\",\"doi\":\"10.1109/HIPC.1998.738010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper we present two new parallel join algorithms for coarse grained machines which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable.\",\"PeriodicalId\":175528,\"journal\":{\"name\":\"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HIPC.1998.738010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Fifth International Conference on High Performance Computing (Cat. No. 98EX238)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIPC.1998.738010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Skew-insensitive parallel algorithms for relational join
Join is the most important and expensive operation in relational databases. The parallel join operation is very sensitive to the presence of the data skew. In this paper we present two new parallel join algorithms for coarse grained machines which work optimally in presence of arbitrary amount of data skew. The first algorithm is sort-based and the second is hash-based. Both of these algorithms employ a preprocessing phase to equally partition the work among the processors. These algorithms are shown to be theoretically as well as practically scalable.