Skew-resistant parallel in-memory spatial join

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management Pub Date : 2014-06-30 DOI:10.1145/2618243.2618262

S. Ray, Bogdan Simion, Angela Demke Brown, Ryan Johnson

{"title":"Skew-resistant parallel in-memory spatial join","authors":"S. Ray, Bogdan Simion, Angela Demke Brown, Ryan Johnson","doi":"10.1145/2618243.2618262","DOIUrl":null,"url":null,"abstract":"Spatial join is a crucial operation in many spatial analysis applications in scientific and geographical information systems. Due to the compute-intensive nature of spatial predicate evaluation, spatial join queries can be slow even with a moderate sized dataset. Efficient parallelization of spatial join is therefore essential to achieve acceptable performance for many spatial applications. Technological trends, including the rising core count and increasingly large main memory, hold great promise in this regard. Previous parallel spatial join approaches tried to partition the dataset so that the number of spatial objects in each partition was as equal as possible. They also focused only on the filter step. However, when the more compute-intensive refinement step is included, significant processing skew may arise due to the uneven size of the objects. This processing skew significantly limits the achievable parallel performance of the spatial join queries, as the longest-running spatial partition determines the overall query execution time.\n Our solution is SPINOJA, a skew-resistant parallel in-memory spatial join infrastructure. SPINOJA introduces MOD-Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized. We compare three work metrics used to create the partitions and three load-balancing strategies to assign the partitions to multiple cores. SPINOJA uses an in-memory column-store to store the spatial tables. Our evaluation shows that SPINOJA outperforms in-memory implementations of previous spatial join approaches by a significant margin and a recently proposed in-memory spatial join algorithm by an order of magnitude.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"6:1-6:12"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2618243.2618262","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Spatial join is a crucial operation in many spatial analysis applications in scientific and geographical information systems. Due to the compute-intensive nature of spatial predicate evaluation, spatial join queries can be slow even with a moderate sized dataset. Efficient parallelization of spatial join is therefore essential to achieve acceptable performance for many spatial applications. Technological trends, including the rising core count and increasingly large main memory, hold great promise in this regard. Previous parallel spatial join approaches tried to partition the dataset so that the number of spatial objects in each partition was as equal as possible. They also focused only on the filter step. However, when the more compute-intensive refinement step is included, significant processing skew may arise due to the uneven size of the objects. This processing skew significantly limits the achievable parallel performance of the spatial join queries, as the longest-running spatial partition determines the overall query execution time. Our solution is SPINOJA, a skew-resistant parallel in-memory spatial join infrastructure. SPINOJA introduces MOD-Quadtree declustering, which partitions the spatial dataset such that the amount of computation demanded by each partition is equalized and the processing skew is minimized. We compare three work metrics used to create the partitions and three load-balancing strategies to assign the partitions to multiple cores. SPINOJA uses an in-memory column-store to store the spatial tables. Our evaluation shows that SPINOJA outperforms in-memory implementations of previous spatial join approaches by a significant margin and a recently proposed in-memory spatial join algorithm by an order of magnitude.

查看原文本刊更多论文

抗歪斜并行内存空间连接

空间连接是科学和地理信息系统中许多空间分析应用的关键操作。由于空间谓词计算的计算密集型性质，即使使用中等大小的数据集，空间连接查询也可能很慢。因此，空间连接的高效并行化对于许多空间应用程序实现可接受的性能至关重要。技术趋势，包括不断增加的核心数量和越来越大的主存，在这方面带来了很大的希望。以前的并行空间连接方法试图对数据集进行分区，使每个分区中的空间对象数量尽可能相等。他们也只关注过滤步骤。然而，当包含更多计算密集型的细化步骤时，由于对象的大小不均匀，可能会出现明显的处理偏差。这种处理倾斜极大地限制了空间连接查询可实现的并行性能，因为运行时间最长的空间分区决定了总体查询执行时间。我们的解决方案是SPINOJA，一个抗倾斜的并行内存空间连接基础设施。SPINOJA引入了mod -四叉树聚类，它对空间数据集进行分区，使每个分区所需的计算量相等，并使处理倾斜最小化。我们比较了用于创建分区的三种工作指标和用于将分区分配给多个核心的三种负载平衡策略。SPINOJA使用内存中的列存储来存储空间表。我们的评估表明，SPINOJA比以前的空间连接方法的内存实现有很大的优势，并且比最近提出的内存空间连接算法有一个数量级的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量