Relaxing the data access bottleneck of geographic big-data analytics applications using distributed quad trees

2016 5th International Conference on Multimedia Computing and Systems (ICMCS) Pub Date : 2016-09-01 DOI:10.1109/ICMCS.2016.7905524

Mayumbo Nyirenda, Hiroki Arimura, Kimihito Ito

{"title":"Relaxing the data access bottleneck of geographic big-data analytics applications using distributed quad trees","authors":"Mayumbo Nyirenda, Hiroki Arimura, Kimihito Ito","doi":"10.1109/ICMCS.2016.7905524","DOIUrl":null,"url":null,"abstract":"Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale data-centric applications in the big data era such as data assimilation and urban data analytic systems. In this paper, we consider the issue of implementation of distributed spatial indices, specifically quad trees, on a distributed computing system in the shared-nothing memory approach. We discuss static and dynamic partitioning and allocation strategies for data and queries across distributed nodes. Using scale-down parallel data load and search experiments with a small distributed processor system as proof-of-concept, we show that the proposed approach with a collection of small indices of distributed shared-nothing memory is more efficient than the conventional approach with a single processor with a large external index. We also observed that the proposed tree-based partitioning and assignment strategy using sampling reduces query time than other conventional partitioning strategies used in databases. We also discuss how to allocate a collection of small tree indices among distributed processors. These results suggest that the use of parallelized access to databases with spatial indexing functions can enhance the throughput of large-scale data-centric applications.","PeriodicalId":345854,"journal":{"name":"2016 5th International Conference on Multimedia Computing and Systems (ICMCS)","volume":"61 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th International Conference on Multimedia Computing and Systems (ICMCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMCS.2016.7905524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale data-centric applications in the big data era such as data assimilation and urban data analytic systems. In this paper, we consider the issue of implementation of distributed spatial indices, specifically quad trees, on a distributed computing system in the shared-nothing memory approach. We discuss static and dynamic partitioning and allocation strategies for data and queries across distributed nodes. Using scale-down parallel data load and search experiments with a small distributed processor system as proof-of-concept, we show that the proposed approach with a collection of small indices of distributed shared-nothing memory is more efficient than the conventional approach with a single processor with a large external index. We also observed that the proposed tree-based partitioning and assignment strategy using sampling reduces query time than other conventional partitioning strategies used in databases. We also discuss how to allocate a collection of small tree indices among distributed processors. These results suggest that the use of parallelized access to databases with spatial indexing functions can enhance the throughput of large-scale data-centric applications.

查看原文本刊更多论文

利用分布式四叉树缓解地理大数据分析应用的数据访问瓶颈

海量地理空间数据的数据访问是数据同化、城市数据分析系统等大数据时代大规模数据中心应用的严重瓶颈之一。在本文中，我们考虑了分布式计算系统在无共享内存方法下的分布式空间索引，特别是四叉树的实现问题。我们将讨论跨分布式节点的数据和查询的静态和动态分区和分配策略。通过小型分布式处理器系统的并行数据加载和搜索实验作为概念验证，我们表明，使用分布式无共享内存的小索引集合所提出的方法比使用单个处理器具有大型外部索引的传统方法更有效。我们还观察到，使用采样的基于树的分区和分配策略比数据库中使用的其他传统分区策略减少了查询时间。我们还讨论了如何在分布式处理器之间分配一组小的树索引。这些结果表明，使用具有空间索引功能的并行访问数据库可以提高大规模数据中心应用程序的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 5th International Conference on Multimedia Computing and Systems (ICMCS)

自引率

0.00%

发文量