Li Liu, Chun-hua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang
{"title":"高效写的数据感知学习索引方案","authors":"Li Liu, Chun-hua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang","doi":"10.1145/3545008.3545077","DOIUrl":null,"url":null,"abstract":"Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Data-aware Learned Index Scheme for Efficient Writes\",\"authors\":\"Li Liu, Chun-hua Li, Zhou Zhang, Yuhan Liu, Ke Zhou, Ji Zhang\",\"doi\":\"10.1145/3545008.3545077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545077\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Data-aware Learned Index Scheme for Efficient Writes
Index structure is very important for efficient data access and system performance in the storage system. Learned index utilizes recursive index models to replace range index structure (such as B+ Tree) so as to predict the position of a lookup key in a dataset. This new paradigm greatly reduces query time and index size, however it only supports read-only workloads. Although some studies reserve gaps between keys for new data to support update, they incur high memory space and shift cost when a large number of data are inserted. In this paper, we propose a data-aware learned index scheme with high scalability, called EWALI, which constructs index models based on a lightweight data-aware data partition algorithm. When the data distribution changes, EWALI can automatically split the related leaf nodes and retrain the corresponding models to accommodate different workloads. In addition, EWALI designs an alternative duel buffers to handle new data and adopts the delayed update mechanism to merge data, greatly reducing write locking and improving write performance. We evaluate EWALI with real-world and synthetic datasets. Extensive experimental results show that EWALI reduces write latency respectively by 60.9% and 33.7% than state-of-the-art Fitting-Tree and XIndex, and achieves up to 3.1 × performance improvement in terms of range query comparing with XIndex.