{"title":"Relaxing the data access bottleneck of geographic big-data analytics applications using distributed quad trees","authors":"Mayumbo Nyirenda, Hiroki Arimura, Kimihito Ito","doi":"10.1109/ICMCS.2016.7905524","DOIUrl":null,"url":null,"abstract":"Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale data-centric applications in the big data era such as data assimilation and urban data analytic systems. In this paper, we consider the issue of implementation of distributed spatial indices, specifically quad trees, on a distributed computing system in the shared-nothing memory approach. We discuss static and dynamic partitioning and allocation strategies for data and queries across distributed nodes. Using scale-down parallel data load and search experiments with a small distributed processor system as proof-of-concept, we show that the proposed approach with a collection of small indices of distributed shared-nothing memory is more efficient than the conventional approach with a single processor with a large external index. We also observed that the proposed tree-based partitioning and assignment strategy using sampling reduces query time than other conventional partitioning strategies used in databases. We also discuss how to allocate a collection of small tree indices among distributed processors. These results suggest that the use of parallelized access to databases with spatial indexing functions can enhance the throughput of large-scale data-centric applications.","PeriodicalId":345854,"journal":{"name":"2016 5th International Conference on Multimedia Computing and Systems (ICMCS)","volume":"61 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th International Conference on Multimedia Computing and Systems (ICMCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMCS.2016.7905524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale data-centric applications in the big data era such as data assimilation and urban data analytic systems. In this paper, we consider the issue of implementation of distributed spatial indices, specifically quad trees, on a distributed computing system in the shared-nothing memory approach. We discuss static and dynamic partitioning and allocation strategies for data and queries across distributed nodes. Using scale-down parallel data load and search experiments with a small distributed processor system as proof-of-concept, we show that the proposed approach with a collection of small indices of distributed shared-nothing memory is more efficient than the conventional approach with a single processor with a large external index. We also observed that the proposed tree-based partitioning and assignment strategy using sampling reduces query time than other conventional partitioning strategies used in databases. We also discuss how to allocate a collection of small tree indices among distributed processors. These results suggest that the use of parallelized access to databases with spatial indexing functions can enhance the throughput of large-scale data-centric applications.