Performance comparisons of spatial data processing techniques for a large scale mobile phone dataset

International Conference and Exhibition on Computing for Geospatial Research & Application Pub Date : 2012-07-01 DOI:10.1145/2345316.2345346

Apichon Witayangkurn, T. Horanont, R. Shibasaki

{"title":"Performance comparisons of spatial data processing techniques for a large scale mobile phone dataset","authors":"Apichon Witayangkurn, T. Horanont, R. Shibasaki","doi":"10.1145/2345316.2345346","DOIUrl":null,"url":null,"abstract":"Mobile technology, especially mobile phone, is very popular nowadays. Increasing number of mobile users and availability of GPS-embedded mobile phones generate large amount of GPS trajectories that can be used in various research areas such as people mobility and transportation planning. However, how to handle such a large-scale dataset is a significant issue particularly in spatial analysis domain. In this paper, we aimed to explore a suitable way for extracting geo-location of GPS coordinate that achieve large-scale support, fast processing, and easily scalable both in storage and calculation speed. Geo-locations are cities, zones, or any interesting points. Our dataset is GPS trajectories of 1.5 million individual mobile phone users in Japan accumulated for one year. The total number was approximately 9.2 billion records. Therefore, we conducted performance comparisons of various methods for processing spatial data, particularly for a huge dataset. In this work, we first processed data on PostgreSQL with PostGIS that is a traditional way for spatial data processing. Second, we used java application with spatial library called Java Topology suite (JTS). Third, we tried on Hadoop Cloud Computing Platform focusing on using Hive on top of Hadoop to allow SQL-like support. However, Hadoop/Hive did not support spatial query at the moment. Hence, we proposed a solution to enable spatial support on Hive. As the results, Hadoop/hive with spatial support performed best result in large-scale processing among evaluated methods and in addition, we recommended techniques in Hadoop/Hive for processing different types of spatial data.","PeriodicalId":400763,"journal":{"name":"International Conference and Exhibition on Computing for Geospatial Research & Application","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference and Exhibition on Computing for Geospatial Research & Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2345316.2345346","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

Mobile technology, especially mobile phone, is very popular nowadays. Increasing number of mobile users and availability of GPS-embedded mobile phones generate large amount of GPS trajectories that can be used in various research areas such as people mobility and transportation planning. However, how to handle such a large-scale dataset is a significant issue particularly in spatial analysis domain. In this paper, we aimed to explore a suitable way for extracting geo-location of GPS coordinate that achieve large-scale support, fast processing, and easily scalable both in storage and calculation speed. Geo-locations are cities, zones, or any interesting points. Our dataset is GPS trajectories of 1.5 million individual mobile phone users in Japan accumulated for one year. The total number was approximately 9.2 billion records. Therefore, we conducted performance comparisons of various methods for processing spatial data, particularly for a huge dataset. In this work, we first processed data on PostgreSQL with PostGIS that is a traditional way for spatial data processing. Second, we used java application with spatial library called Java Topology suite (JTS). Third, we tried on Hadoop Cloud Computing Platform focusing on using Hive on top of Hadoop to allow SQL-like support. However, Hadoop/Hive did not support spatial query at the moment. Hence, we proposed a solution to enable spatial support on Hive. As the results, Hadoop/hive with spatial support performed best result in large-scale processing among evaluated methods and in addition, we recommended techniques in Hadoop/Hive for processing different types of spatial data.

查看原文本刊更多论文

大型移动电话数据集空间数据处理技术性能比较

移动技术，尤其是移动电话，现在非常流行。移动用户数量的增加和嵌入GPS的移动电话的可用性产生了大量的GPS轨迹，可用于各种研究领域，如人员流动和交通规划。然而，如何处理如此大规模的数据集是一个重要的问题，特别是在空间分析领域。本文旨在探索一种适合的GPS坐标地理位置提取方法，实现大规模支持、快速处理、存储和计算速度易于扩展。地理位置是指城市、区域或任何有趣的点。我们的数据集是日本150万个人手机用户一年累积的GPS轨迹。总数约为92亿条记录。因此，我们对各种处理空间数据的方法进行了性能比较，特别是对于一个巨大的数据集。在这项工作中，我们首先使用传统的空间数据处理方式PostGIS在PostgreSQL上处理数据。其次，我们使用java应用程序与空间库称为java拓扑套件(JTS)。第三，我们在Hadoop云计算平台上进行了尝试，重点是在Hadoop之上使用Hive来实现类似sql的支持。但是Hadoop/Hive目前还不支持空间查询。因此，我们提出了一个在Hive上实现空间支持的解决方案。结果表明，在评估的方法中，具有空间支持的Hadoop/hive在大规模处理中表现最好，此外，我们还推荐了Hadoop/hive中处理不同类型空间数据的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference and Exhibition on Computing for Geospatial Research & Application

自引率

0.00%

发文量