TagTree: Global Tagging Index with Efficient Querying for Time Series Databases

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00127

Jin Xue, Zhiqi Wang, Tianyu Wang, Z. Shao

{"title":"TagTree: Global Tagging Index with Efficient Querying for Time Series Databases","authors":"Jin Xue, Zhiqi Wang, Tianyu Wang, Z. Shao","doi":"10.1109/ipdps53621.2022.00127","DOIUrl":null,"url":null,"abstract":"Modern time series databases come with a tag-based query interface that allows users to select time series, which are essentially sequences of timestamped data values, based on a set of specific tags. A tagging index is an important component that can efficiently provide such tag-based services. However, existing methods store tag information in external databases or time-partitioned data structures, which has a negative impact on query performance. In this paper, we present a novel abstraction for efficient queries of tag information in time series databases: a hybrid tagging index that manages all tags in one place. By managing tag information globally in a single disk-based data structure, we can fundamentally relieve memory pressure and eliminate I/O overhead of duplicate metadata from existing methods. Furthermore, the tagging index is internally partitioned by time to support time range based queries and data retention which are essential to time series databases. We implement the proposed tagging index as a standalone module which can be integrated with time series storage engines. Experiments on the TSBS benchmark show our proposed method can significantly speed up queries by on average 84.0% and 87.2% compared to Prometheus (using a time-partitioned segment method) and Graphite (using an external database for tag management), respectively.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"68 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Modern time series databases come with a tag-based query interface that allows users to select time series, which are essentially sequences of timestamped data values, based on a set of specific tags. A tagging index is an important component that can efficiently provide such tag-based services. However, existing methods store tag information in external databases or time-partitioned data structures, which has a negative impact on query performance. In this paper, we present a novel abstraction for efficient queries of tag information in time series databases: a hybrid tagging index that manages all tags in one place. By managing tag information globally in a single disk-based data structure, we can fundamentally relieve memory pressure and eliminate I/O overhead of duplicate metadata from existing methods. Furthermore, the tagging index is internally partitioned by time to support time range based queries and data retention which are essential to time series databases. We implement the proposed tagging index as a standalone module which can be integrated with time series storage engines. Experiments on the TSBS benchmark show our proposed method can significantly speed up queries by on average 84.0% and 87.2% compared to Prometheus (using a time-partitioned segment method) and Graphite (using an external database for tag management), respectively.

查看原文本刊更多论文

TagTree:具有高效查询功能的时间序列数据库全局标记索引

现代时间序列数据库提供了一个基于标记的查询界面，允许用户根据一组特定标记选择时间序列，时间序列本质上是带有时间戳的数据值序列。标记索引是一个重要的组件，可以有效地提供这种基于标记的服务。然而，现有的方法将标记信息存储在外部数据库或分时数据结构中，这对查询性能有负面影响。本文提出了一种有效查询时间序列数据库中标签信息的新抽象:一种混合标签索引，它在一个地方管理所有标签。通过在单个基于磁盘的数据结构中全局管理标记信息，我们可以从根本上减轻内存压力，并消除现有方法中重复元数据的I/O开销。此外，标记索引在内部按时间进行分区，以支持基于时间范围的查询和数据保留，这是时间序列数据库所必需的。我们将提出的标记索引作为一个独立的模块来实现，该模块可以与时间序列存储引擎集成。在TSBS基准测试上的实验表明，与Prometheus(使用分时段方法)和Graphite(使用外部数据库进行标签管理)相比，我们提出的方法可以显著提高查询速度，平均分别提高84.0%和87.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量