Speeding up construction of distributed quadtrees for big-data analytics applications using dilated integers and hashmaps

2017 International Symposium on Networks, Computers and Communications (ISNCC) Pub Date : 2017-05-01 DOI:10.1109/ISNCC.2017.8072032

Mayumbo Nyirenda, David Zulu

{"title":"Speeding up construction of distributed quadtrees for big-data analytics applications using dilated integers and hashmaps","authors":"Mayumbo Nyirenda, David Zulu","doi":"10.1109/ISNCC.2017.8072032","DOIUrl":null,"url":null,"abstract":"Fast access through retrieval and insertion of data is critical to spatial big data analytics applications. This access is however one of the bottlenecks in large-scale spatial data-centric applications. Distributed spatial indexing structures such as quadtrees have been proposed to help alleviate this bottleneck. Some of the proposed solutions use a static sample of the data to build a quadtree as a directory structure for locating distributed data servers. In this paper, we take into account the process of query redirection during the construction of the distributed quadtree as well as query redirection during a data retrieval process. We propose taking advantage of the static nature of the sample of the data and the use of hashmaps and dilated integers to speed up traversal of the directory. We conduct experiments for construction and data querying and show that both construction and querying performance improves threefold when you compare the new approach to the previously proposed approach. In addition further experiments show that the proposed new approach is much less sensitive to data skewness. Overall our results show that use of dilated integers coupled with hashmaps can improve the performance of distributed spatial indexing structures used to help alleviate the data access bottleneck in big data spatial analytics.","PeriodicalId":176998,"journal":{"name":"2017 International Symposium on Networks, Computers and Communications (ISNCC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Symposium on Networks, Computers and Communications (ISNCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNCC.2017.8072032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Fast access through retrieval and insertion of data is critical to spatial big data analytics applications. This access is however one of the bottlenecks in large-scale spatial data-centric applications. Distributed spatial indexing structures such as quadtrees have been proposed to help alleviate this bottleneck. Some of the proposed solutions use a static sample of the data to build a quadtree as a directory structure for locating distributed data servers. In this paper, we take into account the process of query redirection during the construction of the distributed quadtree as well as query redirection during a data retrieval process. We propose taking advantage of the static nature of the sample of the data and the use of hashmaps and dilated integers to speed up traversal of the directory. We conduct experiments for construction and data querying and show that both construction and querying performance improves threefold when you compare the new approach to the previously proposed approach. In addition further experiments show that the proposed new approach is much less sensitive to data skewness. Overall our results show that use of dilated integers coupled with hashmaps can improve the performance of distributed spatial indexing structures used to help alleviate the data access bottleneck in big data spatial analytics.

查看原文本刊更多论文

使用扩展整数和哈希映射加速大数据分析应用的分布式四叉树的构建

通过检索和插入数据的快速访问对空间大数据分析应用至关重要。然而，这种访问是大规模以空间数据为中心的应用程序的瓶颈之一。分布式空间索引结构(如四叉树)被提出来帮助缓解这一瓶颈。一些建议的解决方案使用数据的静态样本来构建四叉树，作为定位分布式数据服务器的目录结构。本文考虑了分布式四叉树构造过程中的查询重定向过程和数据检索过程中的查询重定向过程。我们建议利用数据样本的静态特性，并使用哈希映射和扩展整数来加快目录的遍历。我们对构造和数据查询进行了实验，结果表明，与之前提出的方法相比，新方法的构造和查询性能都提高了三倍。此外，进一步的实验表明，提出的新方法对数据偏度的敏感性大大降低。总的来说，我们的研究结果表明，使用扩展整数和哈希映射可以提高分布式空间索引结构的性能，有助于缓解大数据空间分析中的数据访问瓶颈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 International Symposium on Networks, Computers and Communications (ISNCC)

自引率

0.00%

发文量