A Data-aware Learned Index Scheme for Efficient Writes

Proceedings of the 51st International Conference on Parallel Processing Pub Date : 2022-08-29 DOI:10.1145/3545008.3545077

Li Liu, Chun-hua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang

{"title":"A Data-aware Learned Index Scheme for Efficient Writes","authors":"Li Liu, Chun-hua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang","doi":"10.1145/3545008.3545077","DOIUrl":null,"url":null,"abstract":"Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.

查看原文本刊更多论文

高效写的数据感知学习索引方案

在存储系统中，索引结构是保证数据访问效率和系统性能的重要组成部分。学习索引利用递归索引模型代替范围索引结构(如B+ Tree)，从而预测查找键在数据集中的位置。这种新范式大大减少了查询时间和索引大小，但是它只支持只读工作负载。虽然有些研究为新数据保留键间隙以支持更新，但当插入大量数据时，会产生较高的内存空间和移位成本。本文提出了一种具有高可扩展性的数据感知学习索引方案EWALI，该方案基于一种轻量级的数据感知数据分区算法构建索引模型。当数据分布发生变化时，EWALI可以自动拆分相关的叶节点，并重新训练相应的模型以适应不同的工作负载。此外，EWALI还设计了一种替代的决斗缓冲区来处理新数据，并采用延迟更新机制来合并数据，大大减少了写锁定，提高了写性能。我们用真实世界和合成数据集来评估EWALI。大量的实验结果表明，EWALI比最先进的fit - tree和XIndex分别减少了60.9%和33.7%的写延迟，在范围查询方面比XIndex实现了高达3.1倍的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 51st International Conference on Parallel Processing

自引率

0.00%

发文量