Comparative Evaluation of Various Indexing Techniques of Geospatial Vector Data for Processing in Distributed Computing Environment

Proceedings of the 9th Annual ACM India Conference Pub Date : 2016-10-21 DOI:10.1145/2998476.2998493

Abdul Jhummarwala, Mazin Alkathiri, Miren Karamta, M. Potdar

{"title":"Comparative Evaluation of Various Indexing Techniques of Geospatial Vector Data for Processing in Distributed Computing Environment","authors":"Abdul Jhummarwala, Mazin Alkathiri, Miren Karamta, M. Potdar","doi":"10.1145/2998476.2998493","DOIUrl":null,"url":null,"abstract":"The explosion of ever increasing geospatial data is today met with the challenge of maintaining it in spatial databases and utilization of traditional methods of spatial data processing. The sheer volume and complexity of spatial databases makes them an ideal candidate for use with parallel and distributed processing architectures. There is a lot of enthusiasm toward using MapReduce paradigm and distributed computing for processing of large volumes of vector data. As spatial data cannot be indexed using traditional B-tree structures used by R/DBMS, several libraries such as JSI (Java Spatial Index), libspatialindex and SpatiaLite depend upon advanced data structures such as R/R*-tree, Quad-tree and their variants for spatial indexing. These indexing mechanisms have also been natively incorporated in frameworks such as Spatial Hadoop, Hadoop GIS SATO and GeoSpark. Additionally, most widely used open source RDBMS such as MySQL, Postgres and SQLite incorporate spatial indexing using extensions/add-ons. In this paper, we benchmark and compare the performance of various spatial indexing mechanisms in addition to evaluating the performance of distributed frameworks for planet sized datasets. We conclude by highlighting the characteristics of spatial tools and frameworks for better selection and implementation of R-tree indexing in a big geo-spatial processing system.","PeriodicalId":171399,"journal":{"name":"Proceedings of the 9th Annual ACM India Conference","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Annual ACM India Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2998476.2998493","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The explosion of ever increasing geospatial data is today met with the challenge of maintaining it in spatial databases and utilization of traditional methods of spatial data processing. The sheer volume and complexity of spatial databases makes them an ideal candidate for use with parallel and distributed processing architectures. There is a lot of enthusiasm toward using MapReduce paradigm and distributed computing for processing of large volumes of vector data. As spatial data cannot be indexed using traditional B-tree structures used by R/DBMS, several libraries such as JSI (Java Spatial Index), libspatialindex and SpatiaLite depend upon advanced data structures such as R/R*-tree, Quad-tree and their variants for spatial indexing. These indexing mechanisms have also been natively incorporated in frameworks such as Spatial Hadoop, Hadoop GIS SATO and GeoSpark. Additionally, most widely used open source RDBMS such as MySQL, Postgres and SQLite incorporate spatial indexing using extensions/add-ons. In this paper, we benchmark and compare the performance of various spatial indexing mechanisms in addition to evaluating the performance of distributed frameworks for planet sized datasets. We conclude by highlighting the characteristics of spatial tools and frameworks for better selection and implementation of R-tree indexing in a big geo-spatial processing system.

查看原文本刊更多论文

分布式计算环境下各种地理空间矢量数据索引处理技术的比较评价

随着地理空间数据的爆炸式增长，在空间数据库中维护地理空间数据和利用传统的空间数据处理方法面临着挑战。空间数据库的庞大容量和复杂性使其成为并行和分布式处理体系结构的理想选择。对于使用MapReduce范式和分布式计算来处理大量向量数据，人们有着很大的热情。由于空间数据不能使用R/DBMS使用的传统b树结构进行索引，一些库，如JSI (Java空间索引)，libspatialindex和SpatiaLite依赖于高级数据结构，如R/R*-tree，四叉树及其变体来进行空间索引。这些索引机制也已经被整合到诸如Spatial Hadoop、Hadoop GIS SATO和GeoSpark等框架中。此外，大多数广泛使用的开源RDBMS(如MySQL、Postgres和SQLite)都使用扩展/附加组件合并了空间索引。在本文中，除了评估分布式框架在行星大小数据集上的性能外，我们还对各种空间索引机制的性能进行了基准测试和比较。最后，我们强调了空间工具和框架的特点，以便在大型地理空间处理系统中更好地选择和实施r树索引。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 9th Annual ACM India Conference

自引率

0.00%

发文量