Querying Interval Data on Steroids

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-08-11 DOI:10.1109/TKDE.2025.3597399

Panagiotis Bouros;George Christodoulou;Christian Rauch;Artur Titkov;Nikos Mamoulis

{"title":"Querying Interval Data on Steroids","authors":"Panagiotis Bouros;George Christodoulou;Christian Rauch;Artur Titkov;Nikos Mamoulis","doi":"10.1109/TKDE.2025.3597399","DOIUrl":null,"url":null,"abstract":"A wide range of applications manage interval data with selections and overlap joins being the most fundamental querying operations. Selection queries are typically evaluated using interval indexing. However, the statethe-of-art HINT index and its competitors, are only designed for single query requests while modern systems receive a large number of queries at the same time. In view of this challenge, we study the batch processing of selection queries on HINT. We propose two novel strategies termed level-based and partition-based, which operate in a per-level fashion, i.e., they collect the results for all queries at an index level before moving to the next. The new strategies reduce the cache misses when climbing the index hierarchy, and in particular, partition-based can prevent scanning every index partition more than once. Our experiments on real-world intervals showed that our batch strategies always outperform a baseline which executes queries in a serial fashion, and that partition-based is overall the most efficient one. Motivated by our shared computation techniques for query batches, we also study overlap joins anew across the entire spectrum of different setups, based on the (pre)-existence of interval indexing. For unindexed inputs, we enhance the state-of-the-art optFS join algorithm with effective partitioning proposed for HINT and for indexed inputs, we propose a novel algorithm HINT-join which concurrently scans the input indices, joining partition pairs with optFS. Our tests showed the advantage of HINT-join over indexed nestedloops solutions that employ either B+-trees or probing a single HINT even powered by our partition-based batch processing.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"6120-6134"},"PeriodicalIF":10.4000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11122274/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

A wide range of applications manage interval data with selections and overlap joins being the most fundamental querying operations. Selection queries are typically evaluated using interval indexing. However, the statethe-of-art HINT index and its competitors, are only designed for single query requests while modern systems receive a large number of queries at the same time. In view of this challenge, we study the batch processing of selection queries on HINT. We propose two novel strategies termed level-based and partition-based, which operate in a per-level fashion, i.e., they collect the results for all queries at an index level before moving to the next. The new strategies reduce the cache misses when climbing the index hierarchy, and in particular, partition-based can prevent scanning every index partition more than once. Our experiments on real-world intervals showed that our batch strategies always outperform a baseline which executes queries in a serial fashion, and that partition-based is overall the most efficient one. Motivated by our shared computation techniques for query batches, we also study overlap joins anew across the entire spectrum of different setups, based on the (pre)-existence of interval indexing. For unindexed inputs, we enhance the state-of-the-art optFS join algorithm with effective partitioning proposed for HINT and for indexed inputs, we propose a novel algorithm HINT-join which concurrently scans the input indices, joining partition pairs with optFS. Our tests showed the advantage of HINT-join over indexed nestedloops solutions that employ either B+-trees or probing a single HINT even powered by our partition-based batch processing.

查看原文本刊更多论文

查询类固醇的间隔数据

许多应用程序管理间隔数据，其中选择和重叠连接是最基本的查询操作。选择查询通常使用区间索引进行计算。然而，最先进的HINT索引及其竞争对手仅为单个查询请求而设计，而现代系统可以同时接收大量查询。针对这一挑战，我们研究了基于HINT的选择查询的批量处理。我们提出了两种新的策略，称为基于级别和基于分区的策略，它们以逐层的方式操作，也就是说，它们在移动到下一个索引级别之前收集索引级别上所有查询的结果。新策略减少了在爬上索引层次结构时的缓存丢失，特别是基于分区的策略可以防止对每个索引分区进行多次扫描。我们对真实世界间隔的实验表明，我们的批处理策略总是优于以串行方式执行查询的基线，并且基于分区的策略总体上是最有效的。在我们共享的查询批计算技术的激励下，我们还基于（预先）存在的区间索引，在不同设置的整个范围内重新研究重叠连接。对于未索引的输入，我们改进了最先进的optFS连接算法，提出了针对HINT的有效分区；对于索引输入，我们提出了一种新的算法HINT-join，它并发扫描输入索引，用optFS连接分区对。我们的测试表明，与使用B+树或探测单个HINT（甚至由基于分区的批处理提供支持）的索引嵌套循环解决方案相比，HINT-join具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.