SMART-IMPALA:超大海量时空轨迹数据的高效查询

Lianjie Zhou, Wei Tu, Qingquan Li
{"title":"SMART-IMPALA:超大海量时空轨迹数据的高效查询","authors":"Lianjie Zhou, Wei Tu, Qingquan Li","doi":"10.1109/ieeeconf54055.2021.9687505","DOIUrl":null,"url":null,"abstract":"Efficient sharing of hyper massive spatiotemporal trajectory data (HMSTD) is the foundation for establishing large-scale perception infrastructure, such as vehicle monitoring network in a smart city containing New York, Tokyo, Beijing, and Shanghai these megacities. Consequently, the daily trajectory data scale of vehicle monitoring networks in the smart city is growing rapidly, reaching daily volumes of 1 billion. Accessing HMSTD in transport, the Internet of Things, or other fields is hard and limited under the present spatiotemporal data indexing methods. Therefore, we propose a path-divided Hadoop Distributed File System (HDFS) data blocking (SMART) based on the Apache Impala (SMART -Impala) method to optimize the efficient access method of HMSTD to improve the efficiency of hyperdata sharing. Apache Impala, as a practical and powerful distributed data access means for massive data stored in memory, is widely applied in massive data sharing. In Smart-Impala, the spatiotemporal trajectory data retrieve capability of Impala is extended. Besides, a self-adaption parquet data partitioning strategy or pattern is proposed. In experiments, the Shenzhen BeiDou (BD) bus network is selected as the experimental scenario, consisting of 35809 buses equipped with BD positioning sensors, creating 1.03 billion data records each day. The buses distribution in Shenzhen city is achieved from 7:00 a.m. to 9:00 a.m. and 11:00 a.m. to 01:00 p.m. Moreover, SMART-Impala achieves approximately 8 times, 9 times, 29 times, 110 times higher performance than that in MongoDB or HBase in data scales of 10 million, 100 million, 500 million, 1 billion, whose results outperform that of the average division in Impala, MongoDB, and HBase methods.","PeriodicalId":171165,"journal":{"name":"2021 28th International Conference on Geoinformatics","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"SMART-IMPALA: Efficient Querying of hyper Massive Spatiotemporal Trajectory Data\",\"authors\":\"Lianjie Zhou, Wei Tu, Qingquan Li\",\"doi\":\"10.1109/ieeeconf54055.2021.9687505\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient sharing of hyper massive spatiotemporal trajectory data (HMSTD) is the foundation for establishing large-scale perception infrastructure, such as vehicle monitoring network in a smart city containing New York, Tokyo, Beijing, and Shanghai these megacities. Consequently, the daily trajectory data scale of vehicle monitoring networks in the smart city is growing rapidly, reaching daily volumes of 1 billion. Accessing HMSTD in transport, the Internet of Things, or other fields is hard and limited under the present spatiotemporal data indexing methods. Therefore, we propose a path-divided Hadoop Distributed File System (HDFS) data blocking (SMART) based on the Apache Impala (SMART -Impala) method to optimize the efficient access method of HMSTD to improve the efficiency of hyperdata sharing. Apache Impala, as a practical and powerful distributed data access means for massive data stored in memory, is widely applied in massive data sharing. In Smart-Impala, the spatiotemporal trajectory data retrieve capability of Impala is extended. Besides, a self-adaption parquet data partitioning strategy or pattern is proposed. In experiments, the Shenzhen BeiDou (BD) bus network is selected as the experimental scenario, consisting of 35809 buses equipped with BD positioning sensors, creating 1.03 billion data records each day. The buses distribution in Shenzhen city is achieved from 7:00 a.m. to 9:00 a.m. and 11:00 a.m. to 01:00 p.m. Moreover, SMART-Impala achieves approximately 8 times, 9 times, 29 times, 110 times higher performance than that in MongoDB or HBase in data scales of 10 million, 100 million, 500 million, 1 billion, whose results outperform that of the average division in Impala, MongoDB, and HBase methods.\",\"PeriodicalId\":171165,\"journal\":{\"name\":\"2021 28th International Conference on Geoinformatics\",\"volume\":\"56 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th International Conference on Geoinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ieeeconf54055.2021.9687505\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th International Conference on Geoinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ieeeconf54055.2021.9687505","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

超量时空轨迹数据(HMSTD)的高效共享是建立大规模感知基础设施的基础,例如在包含纽约、东京、北京和上海等特大城市的智慧城市中建立车辆监控网络。因此,智慧城市车辆监控网络的日轨迹数据规模快速增长,日量达到10亿。在现有的时空数据索引方法下,在交通运输、物联网或其他领域访问HMSTD是困难和有限的。因此,我们基于Apache Impala (SMART -Impala)方法,提出了一种分路径的HDFS数据阻塞(SMART)方法,对HMSTD的高效访问方式进行优化,提高超数据共享的效率。Apache Impala作为存储在内存中的海量数据的一种实用而强大的分布式数据访问手段,被广泛应用于海量数据共享。在Smart-Impala中,扩展了Impala的时空轨迹数据检索能力。此外,提出了一种自适应拼格数据分区策略或模式。在实验中,选择深圳北斗公交网络作为实验场景,由35809辆搭载北斗定位传感器的公交车组成,每天产生10.3亿条数据记录。深圳市区的公交车调度时间为早上7点到9点,晚上11点到1点。在数据规模为1000万、1亿、5亿、10亿的情况下,SMART-Impala的性能分别是MongoDB、HBase的8倍、9倍、29倍、110倍,优于Impala、MongoDB、HBase方法的平均分割。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SMART-IMPALA: Efficient Querying of hyper Massive Spatiotemporal Trajectory Data
Efficient sharing of hyper massive spatiotemporal trajectory data (HMSTD) is the foundation for establishing large-scale perception infrastructure, such as vehicle monitoring network in a smart city containing New York, Tokyo, Beijing, and Shanghai these megacities. Consequently, the daily trajectory data scale of vehicle monitoring networks in the smart city is growing rapidly, reaching daily volumes of 1 billion. Accessing HMSTD in transport, the Internet of Things, or other fields is hard and limited under the present spatiotemporal data indexing methods. Therefore, we propose a path-divided Hadoop Distributed File System (HDFS) data blocking (SMART) based on the Apache Impala (SMART -Impala) method to optimize the efficient access method of HMSTD to improve the efficiency of hyperdata sharing. Apache Impala, as a practical and powerful distributed data access means for massive data stored in memory, is widely applied in massive data sharing. In Smart-Impala, the spatiotemporal trajectory data retrieve capability of Impala is extended. Besides, a self-adaption parquet data partitioning strategy or pattern is proposed. In experiments, the Shenzhen BeiDou (BD) bus network is selected as the experimental scenario, consisting of 35809 buses equipped with BD positioning sensors, creating 1.03 billion data records each day. The buses distribution in Shenzhen city is achieved from 7:00 a.m. to 9:00 a.m. and 11:00 a.m. to 01:00 p.m. Moreover, SMART-Impala achieves approximately 8 times, 9 times, 29 times, 110 times higher performance than that in MongoDB or HBase in data scales of 10 million, 100 million, 500 million, 1 billion, whose results outperform that of the average division in Impala, MongoDB, and HBase methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信