Efficiently Answering Top-k Window Aggregate Queries: Calculating Coverage Number Sequences over Hierarchical Structures

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI:10.1109/ICDE55515.2023.00104

Jianqiu Xu, R. C. Wong

{"title":"Efficiently Answering Top-k Window Aggregate Queries: Calculating Coverage Number Sequences over Hierarchical Structures","authors":"Jianqiu Xu, R. C. Wong","doi":"10.1109/ICDE55515.2023.00104","DOIUrl":null,"url":null,"abstract":"Given a set of spatio-temporal objects, a top-k window aggregate query reports top-k tuples that are ordered with respect to the number of objects during a given time interval and within a spatial range. For example, when analyzing traffic density in a city, one wishes to retrieve top-k time intervals in a certain area that are decreasingly ordered according to the number of vehicles passing by. As simply performing sequential scan over all objects is a costly procedure, an index structure is typically built to enhance the query performance. A crucial step during the evaluation is to determine the number of objects in an arbitrary node, called coverage number sequence. This is a challenging task since objects appear and disappear at different time points such that the number of objects in the query node changes over time. Also, as a hierarchical index structure, the value of a node at high level is achieved by performing the aggregation over its child nodes. Simply enumerating all objects rooted in the query node suffers from performance issues mainly due to (i) traversing the sub-tree to retrieve a large number of time points and (ii) repeatedly performing the aggregation at certain time points. We propose an efficient approach to solve the performance issue for both R-tree and Octree and support updating for new arrival data objects being inserted into the index. Our approach outperforms alternative methods in general according to a thorough analysis on the complexity. Coverage number sequences as well as proposed optimization techniques are utilized to enhance the performance of window aggregate queries. We confirm the superiority of our approach over alternative methods by performing a comprehensive experimental evaluation over large real datasets in a database system.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE55515.2023.00104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Given a set of spatio-temporal objects, a top-k window aggregate query reports top-k tuples that are ordered with respect to the number of objects during a given time interval and within a spatial range. For example, when analyzing traffic density in a city, one wishes to retrieve top-k time intervals in a certain area that are decreasingly ordered according to the number of vehicles passing by. As simply performing sequential scan over all objects is a costly procedure, an index structure is typically built to enhance the query performance. A crucial step during the evaluation is to determine the number of objects in an arbitrary node, called coverage number sequence. This is a challenging task since objects appear and disappear at different time points such that the number of objects in the query node changes over time. Also, as a hierarchical index structure, the value of a node at high level is achieved by performing the aggregation over its child nodes. Simply enumerating all objects rooted in the query node suffers from performance issues mainly due to (i) traversing the sub-tree to retrieve a large number of time points and (ii) repeatedly performing the aggregation at certain time points. We propose an efficient approach to solve the performance issue for both R-tree and Octree and support updating for new arrival data objects being inserted into the index. Our approach outperforms alternative methods in general according to a thorough analysis on the complexity. Coverage number sequences as well as proposed optimization techniques are utilized to enhance the performance of window aggregate queries. We confirm the superiority of our approach over alternative methods by performing a comprehensive experimental evaluation over large real datasets in a database system.

查看原文本刊更多论文

有效回答Top-k窗口聚合查询:在分层结构上计算覆盖数序列

给定一组时空对象，top-k窗口聚合查询报告top-k元组，这些元组在给定时间间隔和空间范围内按照对象数量排序。例如，在分析一个城市的交通密度时，希望检索某一区域的top-k时间间隔，该时间间隔根据经过的车辆数量递减排序。由于简单地对所有对象执行顺序扫描是一个代价高昂的过程，因此通常构建索引结构来增强查询性能。评估过程中的一个关键步骤是确定任意节点中的对象数量，称为覆盖数序列。这是一项具有挑战性的任务，因为对象在不同的时间点出现和消失，因此查询节点中的对象数量会随时间而变化。此外，作为分层索引结构，通过对其子节点执行聚合来获得高层节点的值。简单地枚举查询节点中的所有对象会导致性能问题，主要原因是:(i)遍历子树以检索大量时间点，以及(ii)在某些时间点重复执行聚合。我们提出了一种有效的方法来解决r树和八叉树的性能问题，并支持对插入索引的新到达数据对象进行更新。根据对复杂性的全面分析，我们的方法总体上优于其他方法。利用覆盖数序列和提出的优化技术来提高窗口聚合查询的性能。我们通过在数据库系统中对大型真实数据集进行全面的实验评估，确认了我们的方法优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE 39th International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量