{"title":"SATO: a spatial data partitioning framework for scalable query processing","authors":"Hoang Vo, Ablimit Aji, Fusheng Wang","doi":"10.1145/2666310.2666365","DOIUrl":null,"url":null,"abstract":"Scalable spatial query processing relies on effective spatial data partitioning for query parallelization, data pruning, and load balancing. These are often challenged by the intrinsic characteristics of spatial data, such as high skew in data distribution and high complexity of irregular multi-dimensional objects. In this demo, we present SATO, a spatial data partitioning framework that can quickly analyze and partition spatial data with an optimal spatial partitioning strategy for scalable query processing. SATO works in following steps: 1) Sample, which samples a small fraction of input data for analysis, 2) Analyze, which quickly analyzes sampled data to find an optimal partition strategy, 3) Tear, which provides data skew aware partitioning and supports MapReduce based scalable partitioning, and 4) Optimize, which collects succinct partition statistics for potential query optimization. SATO also provides multiple level partitioning, which can be used to significantly improve window based queries in cloud based spatial query processing systems. SATO comes with a visualization component that provides heat maps and histograms for qualitative evaluation. SATO has been implemented within the Hadoop-GIS, a high performance spatial data warehousing system over MapReduce. SATO is also released as an independent software package to support various scalable spatial query processing systems. Our experiments have demonstrated that SATO can generate much balanced partitioning that can significantly improve spatial query performance with MapReduce comparing to traditional spatial partitioning approaches.","PeriodicalId":153031,"journal":{"name":"Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","volume":"214 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"69","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2666310.2666365","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 69
Abstract
Scalable spatial query processing relies on effective spatial data partitioning for query parallelization, data pruning, and load balancing. These are often challenged by the intrinsic characteristics of spatial data, such as high skew in data distribution and high complexity of irregular multi-dimensional objects. In this demo, we present SATO, a spatial data partitioning framework that can quickly analyze and partition spatial data with an optimal spatial partitioning strategy for scalable query processing. SATO works in following steps: 1) Sample, which samples a small fraction of input data for analysis, 2) Analyze, which quickly analyzes sampled data to find an optimal partition strategy, 3) Tear, which provides data skew aware partitioning and supports MapReduce based scalable partitioning, and 4) Optimize, which collects succinct partition statistics for potential query optimization. SATO also provides multiple level partitioning, which can be used to significantly improve window based queries in cloud based spatial query processing systems. SATO comes with a visualization component that provides heat maps and histograms for qualitative evaluation. SATO has been implemented within the Hadoop-GIS, a high performance spatial data warehousing system over MapReduce. SATO is also released as an independent software package to support various scalable spatial query processing systems. Our experiments have demonstrated that SATO can generate much balanced partitioning that can significantly improve spatial query performance with MapReduce comparing to traditional spatial partitioning approaches.