Tao Ji;Kai Zhong;Luming Sun;Yiyan Li;Cuiping Li;Hong Chen
{"title":"LIOF:使学习索引学习更快,准确性更高","authors":"Tao Ji;Kai Zhong;Luming Sun;Yiyan Li;Cuiping Li;Hong Chen","doi":"10.1109/TKDE.2025.3548298","DOIUrl":null,"url":null,"abstract":"Learned indexes, emerging as a promising alternative to traditional indexes like B+Tree, utilize machine learning models to enhance query performance and reduce memory usage. However, the widespread adoption of learned indexes is limited by their expensive training cost and the need for high accuracy of internal models. Although some studies attempt to optimize the building process of these learned indexes, existing methods are restrictive in scope and applicability. They are usually tailored to specific index types and heavily rely on pre-trained model knowledge, making deployment a challenging task. In this work, we introduce the Learned Index Optimization Framework (LIOF), a general and easily integrated solution aimed at expediting the training process and improving the accuracy of index model for one-dimensional and multi-dimensional learned indexes. The optimization of LIOF for the learned indexes is intuitive, directly providing optimized parameters for index models based on the distribution of node data. By leveraging the correlation between key distribution and node model parameters, LIOF significantly reduces the training epochs required for each node model. Initially, we introduce an optimization strategy inspired by optimization-based meta-learning to train the LIOF to generate optimized initial parameters for index node models. Subsequently, we present a data-driven encoder and a parameter-centric decoder network, which adaptively translate key distribution into a latent variable representation and decode it into optimized node model initialization. Additionally, to further utilize characteristics of key distribution, we propose a monotonic regularizer and focal loss, guiding LIOF training towards efficiency and precision. Through extensive experimentation on real-world and synthetic datasets, we demonstrate that LIOF provides substantial enhancements in both training efficiency and the predictive accuracy for learned indexes.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 6","pages":"3499-3513"},"PeriodicalIF":8.9000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"LIOF: Make the Learned Index Learn Faster With Higher Accuracy\",\"authors\":\"Tao Ji;Kai Zhong;Luming Sun;Yiyan Li;Cuiping Li;Hong Chen\",\"doi\":\"10.1109/TKDE.2025.3548298\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Learned indexes, emerging as a promising alternative to traditional indexes like B+Tree, utilize machine learning models to enhance query performance and reduce memory usage. However, the widespread adoption of learned indexes is limited by their expensive training cost and the need for high accuracy of internal models. Although some studies attempt to optimize the building process of these learned indexes, existing methods are restrictive in scope and applicability. They are usually tailored to specific index types and heavily rely on pre-trained model knowledge, making deployment a challenging task. In this work, we introduce the Learned Index Optimization Framework (LIOF), a general and easily integrated solution aimed at expediting the training process and improving the accuracy of index model for one-dimensional and multi-dimensional learned indexes. The optimization of LIOF for the learned indexes is intuitive, directly providing optimized parameters for index models based on the distribution of node data. By leveraging the correlation between key distribution and node model parameters, LIOF significantly reduces the training epochs required for each node model. Initially, we introduce an optimization strategy inspired by optimization-based meta-learning to train the LIOF to generate optimized initial parameters for index node models. Subsequently, we present a data-driven encoder and a parameter-centric decoder network, which adaptively translate key distribution into a latent variable representation and decode it into optimized node model initialization. Additionally, to further utilize characteristics of key distribution, we propose a monotonic regularizer and focal loss, guiding LIOF training towards efficiency and precision. Through extensive experimentation on real-world and synthetic datasets, we demonstrate that LIOF provides substantial enhancements in both training efficiency and the predictive accuracy for learned indexes.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 6\",\"pages\":\"3499-3513\"},\"PeriodicalIF\":8.9000,\"publicationDate\":\"2025-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10912756/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10912756/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
LIOF: Make the Learned Index Learn Faster With Higher Accuracy
Learned indexes, emerging as a promising alternative to traditional indexes like B+Tree, utilize machine learning models to enhance query performance and reduce memory usage. However, the widespread adoption of learned indexes is limited by their expensive training cost and the need for high accuracy of internal models. Although some studies attempt to optimize the building process of these learned indexes, existing methods are restrictive in scope and applicability. They are usually tailored to specific index types and heavily rely on pre-trained model knowledge, making deployment a challenging task. In this work, we introduce the Learned Index Optimization Framework (LIOF), a general and easily integrated solution aimed at expediting the training process and improving the accuracy of index model for one-dimensional and multi-dimensional learned indexes. The optimization of LIOF for the learned indexes is intuitive, directly providing optimized parameters for index models based on the distribution of node data. By leveraging the correlation between key distribution and node model parameters, LIOF significantly reduces the training epochs required for each node model. Initially, we introduce an optimization strategy inspired by optimization-based meta-learning to train the LIOF to generate optimized initial parameters for index node models. Subsequently, we present a data-driven encoder and a parameter-centric decoder network, which adaptively translate key distribution into a latent variable representation and decode it into optimized node model initialization. Additionally, to further utilize characteristics of key distribution, we propose a monotonic regularizer and focal loss, guiding LIOF training towards efficiency and precision. Through extensive experimentation on real-world and synthetic datasets, we demonstrate that LIOF provides substantial enhancements in both training efficiency and the predictive accuracy for learned indexes.
期刊介绍:
The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.