Abdul Jhummarwala, Mazin Alkathiri, Miren Karamta, M. Potdar
{"title":"Comparative Evaluation of Various Indexing Techniques of Geospatial Vector Data for Processing in Distributed Computing Environment","authors":"Abdul Jhummarwala, Mazin Alkathiri, Miren Karamta, M. Potdar","doi":"10.1145/2998476.2998493","DOIUrl":null,"url":null,"abstract":"The explosion of ever increasing geospatial data is today met with the challenge of maintaining it in spatial databases and utilization of traditional methods of spatial data processing. The sheer volume and complexity of spatial databases makes them an ideal candidate for use with parallel and distributed processing architectures. There is a lot of enthusiasm toward using MapReduce paradigm and distributed computing for processing of large volumes of vector data. As spatial data cannot be indexed using traditional B-tree structures used by R/DBMS, several libraries such as JSI (Java Spatial Index), libspatialindex and SpatiaLite depend upon advanced data structures such as R/R*-tree, Quad-tree and their variants for spatial indexing. These indexing mechanisms have also been natively incorporated in frameworks such as Spatial Hadoop, Hadoop GIS SATO and GeoSpark. Additionally, most widely used open source RDBMS such as MySQL, Postgres and SQLite incorporate spatial indexing using extensions/add-ons. In this paper, we benchmark and compare the performance of various spatial indexing mechanisms in addition to evaluating the performance of distributed frameworks for planet sized datasets. We conclude by highlighting the characteristics of spatial tools and frameworks for better selection and implementation of R-tree indexing in a big geo-spatial processing system.","PeriodicalId":171399,"journal":{"name":"Proceedings of the 9th Annual ACM India Conference","volume":"112 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th Annual ACM India Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2998476.2998493","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The explosion of ever increasing geospatial data is today met with the challenge of maintaining it in spatial databases and utilization of traditional methods of spatial data processing. The sheer volume and complexity of spatial databases makes them an ideal candidate for use with parallel and distributed processing architectures. There is a lot of enthusiasm toward using MapReduce paradigm and distributed computing for processing of large volumes of vector data. As spatial data cannot be indexed using traditional B-tree structures used by R/DBMS, several libraries such as JSI (Java Spatial Index), libspatialindex and SpatiaLite depend upon advanced data structures such as R/R*-tree, Quad-tree and their variants for spatial indexing. These indexing mechanisms have also been natively incorporated in frameworks such as Spatial Hadoop, Hadoop GIS SATO and GeoSpark. Additionally, most widely used open source RDBMS such as MySQL, Postgres and SQLite incorporate spatial indexing using extensions/add-ons. In this paper, we benchmark and compare the performance of various spatial indexing mechanisms in addition to evaluating the performance of distributed frameworks for planet sized datasets. We conclude by highlighting the characteristics of spatial tools and frameworks for better selection and implementation of R-tree indexing in a big geo-spatial processing system.