Relaxing the data access bottleneck of geographic big-data analytics applications using distributed quad trees

Mayumbo Nyirenda, Hiroki Arimura, Kimihito Ito
{"title":"Relaxing the data access bottleneck of geographic big-data analytics applications using distributed quad trees","authors":"Mayumbo Nyirenda, Hiroki Arimura, Kimihito Ito","doi":"10.1109/ICMCS.2016.7905524","DOIUrl":null,"url":null,"abstract":"Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale data-centric applications in the big data era such as data assimilation and urban data analytic systems. In this paper, we consider the issue of implementation of distributed spatial indices, specifically quad trees, on a distributed computing system in the shared-nothing memory approach. We discuss static and dynamic partitioning and allocation strategies for data and queries across distributed nodes. Using scale-down parallel data load and search experiments with a small distributed processor system as proof-of-concept, we show that the proposed approach with a collection of small indices of distributed shared-nothing memory is more efficient than the conventional approach with a single processor with a large external index. We also observed that the proposed tree-based partitioning and assignment strategy using sampling reduces query time than other conventional partitioning strategies used in databases. We also discuss how to allocate a collection of small tree indices among distributed processors. These results suggest that the use of parallelized access to databases with spatial indexing functions can enhance the throughput of large-scale data-centric applications.","PeriodicalId":345854,"journal":{"name":"2016 5th International Conference on Multimedia Computing and Systems (ICMCS)","volume":"61 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th International Conference on Multimedia Computing and Systems (ICMCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMCS.2016.7905524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Data access of a massive collection of geographic spatial data is one of the serious bottlenecks in large-scale data-centric applications in the big data era such as data assimilation and urban data analytic systems. In this paper, we consider the issue of implementation of distributed spatial indices, specifically quad trees, on a distributed computing system in the shared-nothing memory approach. We discuss static and dynamic partitioning and allocation strategies for data and queries across distributed nodes. Using scale-down parallel data load and search experiments with a small distributed processor system as proof-of-concept, we show that the proposed approach with a collection of small indices of distributed shared-nothing memory is more efficient than the conventional approach with a single processor with a large external index. We also observed that the proposed tree-based partitioning and assignment strategy using sampling reduces query time than other conventional partitioning strategies used in databases. We also discuss how to allocate a collection of small tree indices among distributed processors. These results suggest that the use of parallelized access to databases with spatial indexing functions can enhance the throughput of large-scale data-centric applications.
利用分布式四叉树缓解地理大数据分析应用的数据访问瓶颈
海量地理空间数据的数据访问是数据同化、城市数据分析系统等大数据时代大规模数据中心应用的严重瓶颈之一。在本文中,我们考虑了分布式计算系统在无共享内存方法下的分布式空间索引,特别是四叉树的实现问题。我们将讨论跨分布式节点的数据和查询的静态和动态分区和分配策略。通过小型分布式处理器系统的并行数据加载和搜索实验作为概念验证,我们表明,使用分布式无共享内存的小索引集合所提出的方法比使用单个处理器具有大型外部索引的传统方法更有效。我们还观察到,使用采样的基于树的分区和分配策略比数据库中使用的其他传统分区策略减少了查询时间。我们还讨论了如何在分布式处理器之间分配一组小的树索引。这些结果表明,使用具有空间索引功能的并行访问数据库可以提高大规模数据中心应用程序的吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信