基于排序的r树并行加载

International Workshop on Analytics for Big Geospatial Data Pub Date : 2012-11-06 DOI:10.1145/2447481.2447489

Daniar Achakeev, M. Seidemann, Markus Schmidt, B. Seeger

{"title":"基于排序的r树并行加载","authors":"Daniar Achakeev, M. Seidemann, Markus Schmidt, B. Seeger","doi":"10.1145/2447481.2447489","DOIUrl":null,"url":null,"abstract":"Due to the increasing amount of spatial data, parallel algorithms for processing big spatial data become more and more important. In particular, the shared nothing architecture is attractive as it offers low cost data processing. Moreover, popular MapReduce frameworks such as Hadoop allow developing conceptually simple and scalable algorithms for processing big data using this architecture. In this work we address the problem of parallel loading of R-trees on a shared-nothing platform. The R-tree is a key element for efficient query processing in large spatial database, but its creation is expensive. We proposed a novel scalable parallel loading algorithm for MapReduce. The core of our parallel loading is the state of the art sequential sort-based query-adaptive R-tree loading algorithm that builds R-trees optimized according to a commonly used cost model. In contrast to previous methods for loading R-trees with MapReduce we construct the R-tree level-wise. Our experimental results show an almost linear speedup in the number of machines. Moreover, the resulting R-trees provide a better query performance than R-trees build by other competitive bulk-loading algorithms.","PeriodicalId":416086,"journal":{"name":"International Workshop on Analytics for Big Geospatial Data","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Sort-based parallel loading of R-trees\",\"authors\":\"Daniar Achakeev, M. Seidemann, Markus Schmidt, B. Seeger\",\"doi\":\"10.1145/2447481.2447489\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the increasing amount of spatial data, parallel algorithms for processing big spatial data become more and more important. In particular, the shared nothing architecture is attractive as it offers low cost data processing. Moreover, popular MapReduce frameworks such as Hadoop allow developing conceptually simple and scalable algorithms for processing big data using this architecture. In this work we address the problem of parallel loading of R-trees on a shared-nothing platform. The R-tree is a key element for efficient query processing in large spatial database, but its creation is expensive. We proposed a novel scalable parallel loading algorithm for MapReduce. The core of our parallel loading is the state of the art sequential sort-based query-adaptive R-tree loading algorithm that builds R-trees optimized according to a commonly used cost model. In contrast to previous methods for loading R-trees with MapReduce we construct the R-tree level-wise. Our experimental results show an almost linear speedup in the number of machines. Moreover, the resulting R-trees provide a better query performance than R-trees build by other competitive bulk-loading algorithms.\",\"PeriodicalId\":416086,\"journal\":{\"name\":\"International Workshop on Analytics for Big Geospatial Data\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Analytics for Big Geospatial Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2447481.2447489\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Analytics for Big Geospatial Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2447481.2447489","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

随着空间数据量的不断增加，处理大空间数据的并行算法变得越来越重要。特别地，无共享架构很有吸引力，因为它提供了低成本的数据处理。此外，流行的MapReduce框架(如Hadoop)允许开发概念简单且可扩展的算法，用于使用该架构处理大数据。在这项工作中，我们解决了在无共享平台上并行加载r树的问题。r树是大型空间数据库中高效查询处理的关键元素，但其创建成本较高。提出了一种新的MapReduce可扩展并行加载算法。我们的并行加载的核心是最先进的基于顺序排序的查询自适应r树加载算法，该算法根据常用的成本模型构建优化的r树。与之前用MapReduce加载r树的方法不同，我们分层构建r树。我们的实验结果表明，机器数量几乎呈线性增长。此外，生成的r树提供了比其他竞争性大容量加载算法构建的r树更好的查询性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Sort-based parallel loading of R-trees

Due to the increasing amount of spatial data, parallel algorithms for processing big spatial data become more and more important. In particular, the shared nothing architecture is attractive as it offers low cost data processing. Moreover, popular MapReduce frameworks such as Hadoop allow developing conceptually simple and scalable algorithms for processing big data using this architecture. In this work we address the problem of parallel loading of R-trees on a shared-nothing platform. The R-tree is a key element for efficient query processing in large spatial database, but its creation is expensive. We proposed a novel scalable parallel loading algorithm for MapReduce. The core of our parallel loading is the state of the art sequential sort-based query-adaptive R-tree loading algorithm that builds R-trees optimized according to a commonly used cost model. In contrast to previous methods for loading R-trees with MapReduce we construct the R-tree level-wise. Our experimental results show an almost linear speedup in the number of machines. Moreover, the resulting R-trees provide a better query performance than R-trees build by other competitive bulk-loading algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on Analytics for Big Geospatial Data

自引率

0.00%

发文量