刺森林:用于高效时态查询处理的动态数据结构

Time Pub Date : 2020-01-01 DOI:10.4230/LIPIcs.TIME.2020.18
Jelle Hellings, Yuqing Wu
{"title":"刺森林:用于高效时态查询处理的动态数据结构","authors":"Jelle Hellings, Yuqing Wu","doi":"10.4230/LIPIcs.TIME.2020.18","DOIUrl":null,"url":null,"abstract":"Many sources of data have temporal start and end attributes or are created in a time-ordered manner. Hence, it is only natural to consider joining datasets based on these temporal attributes. To do so efficiently, several internal-memory temporal join algorithms have recently been proposed. Unfortunately, these join algorithms are designed to join entire datasets and cannot efficiently join skewed datasets in which only few events participate in the join result. To support high-performance internal-memory temporal joins of skewed datasets, we propose the skip-join algorithm, which operates on stab-forests. The stab-forest is a novel dynamic data structure for indexing temporal data that allows efficient updates when events are appended in a time-based order. Our stab-forests efficiently support not only traditional temporal stab-queries, but also more general multi-stab-queries. We conducted an experimental evaluation to compare the skip-join algorithm with state-of-the-art techniques using real-world datasets. We observed that the skip-join algorithm outperforms other techniques by an order of magnitude when joining skewed datasets and delivers comparable performance to other techniques on non-skewed datasets. 2012 ACM Subject Classification Information systems → Join algorithms; Information systems → Temporal data","PeriodicalId":75226,"journal":{"name":"Time","volume":"120 1","pages":"18:1-18:19"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Stab-Forests: Dynamic Data Structures for Efficient Temporal Query Processing\",\"authors\":\"Jelle Hellings, Yuqing Wu\",\"doi\":\"10.4230/LIPIcs.TIME.2020.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many sources of data have temporal start and end attributes or are created in a time-ordered manner. Hence, it is only natural to consider joining datasets based on these temporal attributes. To do so efficiently, several internal-memory temporal join algorithms have recently been proposed. Unfortunately, these join algorithms are designed to join entire datasets and cannot efficiently join skewed datasets in which only few events participate in the join result. To support high-performance internal-memory temporal joins of skewed datasets, we propose the skip-join algorithm, which operates on stab-forests. The stab-forest is a novel dynamic data structure for indexing temporal data that allows efficient updates when events are appended in a time-based order. Our stab-forests efficiently support not only traditional temporal stab-queries, but also more general multi-stab-queries. We conducted an experimental evaluation to compare the skip-join algorithm with state-of-the-art techniques using real-world datasets. We observed that the skip-join algorithm outperforms other techniques by an order of magnitude when joining skewed datasets and delivers comparable performance to other techniques on non-skewed datasets. 2012 ACM Subject Classification Information systems → Join algorithms; Information systems → Temporal data\",\"PeriodicalId\":75226,\"journal\":{\"name\":\"Time\",\"volume\":\"120 1\",\"pages\":\"18:1-18:19\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Time\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4230/LIPIcs.TIME.2020.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Time","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.TIME.2020.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

许多数据源具有时间开始和结束属性,或者以时间顺序的方式创建。因此,考虑基于这些时间属性连接数据集是很自然的。为了有效地做到这一点,最近提出了几种内存临时连接算法。不幸的是,这些连接算法被设计为连接整个数据集,而不能有效地连接只有少数事件参与连接结果的倾斜数据集。为了支持倾斜数据集的高性能内存时间连接,我们提出了一种基于刺森林的跳过连接算法。刺林是一种新的动态数据结构,用于索引时态数据,当事件以基于时间的顺序追加时,它允许进行有效的更新。我们的刺刺森林不仅有效地支持传统的时间刺刺查询,而且还支持更一般的多刺刺查询。我们进行了一项实验评估,将跳过连接算法与使用真实世界数据集的最先进技术进行比较。我们观察到,跳跃连接算法在连接倾斜数据集时优于其他技术一个数量级,并且在非倾斜数据集上提供与其他技术相当的性能。2012 ACM主题分类信息系统→Join算法;信息系统→时间数据
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Stab-Forests: Dynamic Data Structures for Efficient Temporal Query Processing
Many sources of data have temporal start and end attributes or are created in a time-ordered manner. Hence, it is only natural to consider joining datasets based on these temporal attributes. To do so efficiently, several internal-memory temporal join algorithms have recently been proposed. Unfortunately, these join algorithms are designed to join entire datasets and cannot efficiently join skewed datasets in which only few events participate in the join result. To support high-performance internal-memory temporal joins of skewed datasets, we propose the skip-join algorithm, which operates on stab-forests. The stab-forest is a novel dynamic data structure for indexing temporal data that allows efficient updates when events are appended in a time-based order. Our stab-forests efficiently support not only traditional temporal stab-queries, but also more general multi-stab-queries. We conducted an experimental evaluation to compare the skip-join algorithm with state-of-the-art techniques using real-world datasets. We observed that the skip-join algorithm outperforms other techniques by an order of magnitude when joining skewed datasets and delivers comparable performance to other techniques on non-skewed datasets. 2012 ACM Subject Classification Information systems → Join algorithms; Information systems → Temporal data
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信