{"title":"TagTree: Global Tagging Index with Efficient Querying for Time Series Databases","authors":"Jin Xue, Zhiqi Wang, Tianyu Wang, Z. Shao","doi":"10.1109/ipdps53621.2022.00127","DOIUrl":null,"url":null,"abstract":"Modern time series databases come with a tag-based query interface that allows users to select time series, which are essentially sequences of timestamped data values, based on a set of specific tags. A tagging index is an important component that can efficiently provide such tag-based services. However, existing methods store tag information in external databases or time-partitioned data structures, which has a negative impact on query performance. In this paper, we present a novel abstraction for efficient queries of tag information in time series databases: a hybrid tagging index that manages all tags in one place. By managing tag information globally in a single disk-based data structure, we can fundamentally relieve memory pressure and eliminate I/O overhead of duplicate metadata from existing methods. Furthermore, the tagging index is internally partitioned by time to support time range based queries and data retention which are essential to time series databases. We implement the proposed tagging index as a standalone module which can be integrated with time series storage engines. Experiments on the TSBS benchmark show our proposed method can significantly speed up queries by on average 84.0% and 87.2% compared to Prometheus (using a time-partitioned segment method) and Graphite (using an external database for tag management), respectively.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"68 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00127","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Modern time series databases come with a tag-based query interface that allows users to select time series, which are essentially sequences of timestamped data values, based on a set of specific tags. A tagging index is an important component that can efficiently provide such tag-based services. However, existing methods store tag information in external databases or time-partitioned data structures, which has a negative impact on query performance. In this paper, we present a novel abstraction for efficient queries of tag information in time series databases: a hybrid tagging index that manages all tags in one place. By managing tag information globally in a single disk-based data structure, we can fundamentally relieve memory pressure and eliminate I/O overhead of duplicate metadata from existing methods. Furthermore, the tagging index is internally partitioned by time to support time range based queries and data retention which are essential to time series databases. We implement the proposed tagging index as a standalone module which can be integrated with time series storage engines. Experiments on the TSBS benchmark show our proposed method can significantly speed up queries by on average 84.0% and 87.2% compared to Prometheus (using a time-partitioned segment method) and Graphite (using an external database for tag management), respectively.