分布式内存系统中时空工作负载的高效处理

2019 20th IEEE International Conference on Mobile Data Management (MDM) Pub Date : 2019-06-10 DOI:10.1109/MDM.2019.00-66

Puya Memarzia, Maria Patrou, M. Alam, S. Ray, V. Bhavsar, K. Kent

{"title":"分布式内存系统中时空工作负载的高效处理","authors":"Puya Memarzia, Maria Patrou, M. Alam, S. Ray, V. Bhavsar, K. Kent","doi":"10.1109/MDM.2019.00-66","DOIUrl":null,"url":null,"abstract":"Location-based services (LBS) are a widely adopted technology that produces large volumes of spatio-temporal data at high velocity. Spatial data is also being generated from many other geo-spatial applications. To address the challenge of data volume, a number of big spatial data management systems have emerged that are based on the MapReduce paradigm. Recent projects have developed spatial data systems using Spark's distributed in-memory architecture. These projects, which include GeoSpark, SpatialSpark, and LocationSpark, do not support the high update rates required by LBS applications. Alternatively, systems such as MD-HBase support data updates, but are hindered by the performance characteristics of HBase, which is a disk-oriented framework. We present DISTIL+, a distributed spatio-temporal data processing system designed for high velocity location data. Our system achieves high update throughput and low query latency by leveraging the APGAS (Asynchronous Partitioned Global Address Space) architecture to build a multi-level distributed in-memory index. We present extensive experimental evaluation of our system, comparing several indexing and data placement schemes, as well as competing systems. Our results show that DISTIL+ excels at supporting high throughput location updates, and low latency spatio-temporal range queries and kNN queries, while offering better performance than existing approaches.","PeriodicalId":241426,"journal":{"name":"2019 20th IEEE International Conference on Mobile Data Management (MDM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Toward Efficient Processing of Spatio-Temporal Workloads in a Distributed In-Memory System\",\"authors\":\"Puya Memarzia, Maria Patrou, M. Alam, S. Ray, V. Bhavsar, K. Kent\",\"doi\":\"10.1109/MDM.2019.00-66\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Location-based services (LBS) are a widely adopted technology that produces large volumes of spatio-temporal data at high velocity. Spatial data is also being generated from many other geo-spatial applications. To address the challenge of data volume, a number of big spatial data management systems have emerged that are based on the MapReduce paradigm. Recent projects have developed spatial data systems using Spark's distributed in-memory architecture. These projects, which include GeoSpark, SpatialSpark, and LocationSpark, do not support the high update rates required by LBS applications. Alternatively, systems such as MD-HBase support data updates, but are hindered by the performance characteristics of HBase, which is a disk-oriented framework. We present DISTIL+, a distributed spatio-temporal data processing system designed for high velocity location data. Our system achieves high update throughput and low query latency by leveraging the APGAS (Asynchronous Partitioned Global Address Space) architecture to build a multi-level distributed in-memory index. We present extensive experimental evaluation of our system, comparing several indexing and data placement schemes, as well as competing systems. Our results show that DISTIL+ excels at supporting high throughput location updates, and low latency spatio-temporal range queries and kNN queries, while offering better performance than existing approaches.\",\"PeriodicalId\":241426,\"journal\":{\"name\":\"2019 20th IEEE International Conference on Mobile Data Management (MDM)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th IEEE International Conference on Mobile Data Management (MDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MDM.2019.00-66\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th IEEE International Conference on Mobile Data Management (MDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MDM.2019.00-66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

基于位置的服务(LBS)是一种被广泛采用的技术，它可以高速产生大量的时空数据。许多其他地理空间应用程序也正在生成空间数据。为了应对数据量的挑战，一些基于MapReduce范式的大空间数据管理系统已经出现。最近的项目使用Spark的分布式内存架构开发了空间数据系统。这些项目包括GeoSpark、SpatialSpark和LocationSpark，不支持LBS应用所要求的高更新率。另外，像MD-HBase这样的系统支持数据更新，但是由于HBase是一个面向磁盘的框架，其性能特性受到了限制。提出了一种用于高速定位数据的分布式时空数据处理系统DISTIL+。我们的系统通过利用异步分区全局地址空间(APGAS)架构来构建一个多层次的分布式内存索引，从而实现了高更新吞吐量和低查询延迟。我们对我们的系统进行了广泛的实验评估，比较了几种索引和数据放置方案，以及竞争系统。我们的研究结果表明，DISTIL+在支持高吞吐量位置更新、低延迟时空范围查询和kNN查询方面表现出色，同时提供比现有方法更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Toward Efficient Processing of Spatio-Temporal Workloads in a Distributed In-Memory System

Location-based services (LBS) are a widely adopted technology that produces large volumes of spatio-temporal data at high velocity. Spatial data is also being generated from many other geo-spatial applications. To address the challenge of data volume, a number of big spatial data management systems have emerged that are based on the MapReduce paradigm. Recent projects have developed spatial data systems using Spark's distributed in-memory architecture. These projects, which include GeoSpark, SpatialSpark, and LocationSpark, do not support the high update rates required by LBS applications. Alternatively, systems such as MD-HBase support data updates, but are hindered by the performance characteristics of HBase, which is a disk-oriented framework. We present DISTIL+, a distributed spatio-temporal data processing system designed for high velocity location data. Our system achieves high update throughput and low query latency by leveraging the APGAS (Asynchronous Partitioned Global Address Space) architecture to build a multi-level distributed in-memory index. We present extensive experimental evaluation of our system, comparing several indexing and data placement schemes, as well as competing systems. Our results show that DISTIL+ excels at supporting high throughput location updates, and low latency spatio-temporal range queries and kNN queries, while offering better performance than existing approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 20th IEEE International Conference on Mobile Data Management (MDM)

自引率

0.00%

发文量