JUST: JD Urban Spatio-Temporal Data Engine

2020 IEEE 36th International Conference on Data Engineering (ICDE) Pub Date : 2020-04-01 DOI:10.1109/ICDE48307.2020.00138

Ruiyuan Li, Huajun He, Rubin Wang, Yuchuan Huang, Junwen Liu, Sijie Ruan, Tianfu He, Jie Bao, Yu Zheng

{"title":"JUST: JD Urban Spatio-Temporal Data Engine","authors":"Ruiyuan Li, Huajun He, Rubin Wang, Yuchuan Huang, Junwen Liu, Sijie Ruan, Tianfu He, Jie Bao, Yu Zheng","doi":"10.1109/ICDE48307.2020.00138","DOIUrl":null,"url":null,"abstract":"With the prevalence of positioning techniques, a prodigious number of spatio-temporal data is generated con-stantly. To effectively support sophisticated urban applications, e.g., location-based services, based on spatio-temporal data, it is desirable for an efficient, scalable, update-enabled, and easy-to-use spatio-temporal data management system.This paper presents JUST, i.e., JD Urban Spatio-Temporal data engine, which can efficiently manage big spatio-temporal data in a convenient way. JUST incorporates the distributed NoSQL data store, i.e., Apache HBase, as the underlying storage, GeoMesa as the spatio-temporal data indexing tool, and Apache Spark as the execution engine. We creatively design two indexing techniques, i.e., Z2T and XZ2T, which accelerates spatio-temporal queries tremendously. Furthermore, we introduce a compression mechanism, which not only greatly reduces the storage cost, but also improves the query efficiency. To make JUST easy-to-use, we design and implement a complete SQL engine, with which all operations can be performed through a SQL-like query language, i.e., JustQL. JUST also supports inherently new data insertions and historical data updates without index reconstruction. JUST is deployed as a PaaS in JD with multi-users support. Many applications have been developed based on the SDKs provided by JUST. Extensive experiments are carried out with six state-of-the-art distributed spatio-temporal data management systems based on two real datasets and one synthetic dataset. The results show that JUST has a competitive query performance and is much more scalable than them.","PeriodicalId":6709,"journal":{"name":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","volume":"53 1","pages":"1558-1569"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"42","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 36th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE48307.2020.00138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 42

Abstract

With the prevalence of positioning techniques, a prodigious number of spatio-temporal data is generated con-stantly. To effectively support sophisticated urban applications, e.g., location-based services, based on spatio-temporal data, it is desirable for an efficient, scalable, update-enabled, and easy-to-use spatio-temporal data management system.This paper presents JUST, i.e., JD Urban Spatio-Temporal data engine, which can efficiently manage big spatio-temporal data in a convenient way. JUST incorporates the distributed NoSQL data store, i.e., Apache HBase, as the underlying storage, GeoMesa as the spatio-temporal data indexing tool, and Apache Spark as the execution engine. We creatively design two indexing techniques, i.e., Z2T and XZ2T, which accelerates spatio-temporal queries tremendously. Furthermore, we introduce a compression mechanism, which not only greatly reduces the storage cost, but also improves the query efficiency. To make JUST easy-to-use, we design and implement a complete SQL engine, with which all operations can be performed through a SQL-like query language, i.e., JustQL. JUST also supports inherently new data insertions and historical data updates without index reconstruction. JUST is deployed as a PaaS in JD with multi-users support. Many applications have been developed based on the SDKs provided by JUST. Extensive experiments are carried out with six state-of-the-art distributed spatio-temporal data management systems based on two real datasets and one synthetic dataset. The results show that JUST has a competitive query performance and is much more scalable than them.

查看原文本刊更多论文

JUST: JD城市时空数据引擎

随着定位技术的发展，不断产生大量的时空数据。为了有效地支持复杂的城市应用，例如基于时空数据的基于位置的服务，需要一个高效、可扩展、可更新且易于使用的时空数据管理系统。本文提出了JD城市时空数据引擎JUST(即JD Urban spatial -temporal data engine)，该引擎能够高效、便捷地管理大时空数据。JUST集成了分布式NoSQL数据存储，即Apache HBase作为底层存储，GeoMesa作为时空数据索引工具，Apache Spark作为执行引擎。我们创造性地设计了Z2T和XZ2T两种索引技术，极大地加快了时空查询的速度。此外，我们还引入了压缩机制，不仅大大降低了存储成本，而且提高了查询效率。为了使JUST易于使用，我们设计并实现了一个完整的SQL引擎，所有的操作都可以通过一种类似SQL的查询语言，即JustQL来执行。JUST还支持无需索引重建的新数据插入和历史数据更新。JUST作为平台即服务(PaaS)部署在京东，支持多用户。许多应用程序都是基于JUST提供的sdk开发的。在2个真实数据集和1个合成数据集的基础上，对6个最先进的分布式时空数据管理系统进行了广泛的实验。结果表明，JUST具有相当的查询性能，并且比它们具有更高的可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 36th International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量