Lixiao Cui;Yijing Luo;Yusen Li;Gang Wang;Xiaoguang Liu
{"title":"When Learned Indexes Meet Persistent Memory: The Analysis and the Optimization","authors":"Lixiao Cui;Yijing Luo;Yusen Li;Gang Wang;Xiaoguang Liu","doi":"10.1109/TKDE.2023.3342825","DOIUrl":null,"url":null,"abstract":"The emerging persistent memory (PM) is increasingly being leveraged to construct high-performance and persistent indexes. By exploiting data distribution, recent learned indexes open up a new index design paradigm. Some prior studies try to refit the learned index according to the features of PM. However, they neglect to analyze the performance of existing learned index schemes on PM. In this paper, we provide a comprehensive analysis of learned indexes on PM and propose two optimization methods to improve the performance. In particular, we evaluate ALEX, PGM-index, and XIndex after converting them to persistent indexes. With appropriate modifications, some design choices of volatile learned index still show favorable performance on PM under workloads with simple data distribution. But they perform poorly when the data distribution becomes complex. According to the experiment results, we summarize some instructive insights and optimize persistent learned indexes for complex data distributions with two methods: 1) a cost-based insertion pattern selection to minimize PM writes and 2) recoverable internal nodes selective persistence to decrease the overhead of internal lookups. Our evaluations demonstrate the performance of optimized ALEX is 2.09x/1.53x of the original ALEX in insert/search. Meanwhile, it also outperforms the specific-designed persistent learned index.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"36 12","pages":"9517-9531"},"PeriodicalIF":8.9000,"publicationDate":"2023-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10373914/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The emerging persistent memory (PM) is increasingly being leveraged to construct high-performance and persistent indexes. By exploiting data distribution, recent learned indexes open up a new index design paradigm. Some prior studies try to refit the learned index according to the features of PM. However, they neglect to analyze the performance of existing learned index schemes on PM. In this paper, we provide a comprehensive analysis of learned indexes on PM and propose two optimization methods to improve the performance. In particular, we evaluate ALEX, PGM-index, and XIndex after converting them to persistent indexes. With appropriate modifications, some design choices of volatile learned index still show favorable performance on PM under workloads with simple data distribution. But they perform poorly when the data distribution becomes complex. According to the experiment results, we summarize some instructive insights and optimize persistent learned indexes for complex data distributions with two methods: 1) a cost-based insertion pattern selection to minimize PM writes and 2) recoverable internal nodes selective persistence to decrease the overhead of internal lookups. Our evaluations demonstrate the performance of optimized ALEX is 2.09x/1.53x of the original ALEX in insert/search. Meanwhile, it also outperforms the specific-designed persistent learned index.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.