一种轻量级学习基数估计模型

IF 10.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Knowledge and Data Engineering Pub Date : 2025-07-21 DOI:10.1109/TKDE.2025.3591025

Yaoyu Zhu;Jintao Zhang;Guoliang Li;Jianhua Feng

{"title":"一种轻量级学习基数估计模型","authors":"Yaoyu Zhu;Jintao Zhang;Guoliang Li;Jianhua Feng","doi":"10.1109/TKDE.2025.3591025","DOIUrl":null,"url":null,"abstract":"Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or take high inference latency. Simultaneously achieving high speed and accuracy becomes critical for the cardinality estimation problem. In this paper, we propose a novel data-driven approach called <italic>CoDe (Covering with Decompositions) to address this problem. <italic>CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments. For each segment, <italic>CoDe utilizes tensor decomposition to accurately model its data distribution. Moreover, <italic>CoDe introduces innovative algorithms to select the best-fitting distributions for each query, combining them to estimate the final result. By employing multiple models to approximate distributions, <italic>CoDe excels in effectively modeling discrete distributions and ensuring computational efficiency. Notably, experimental results show that our method represents a significant advancement in cardinality estimation, achieving state-of-the-art levels of both estimation accuracy and inference efficiency. Across various datasets, <italic>CoDe achieves absolute accuracy in estimating more than half of the queries.","PeriodicalId":13496,"journal":{"name":"IEEE Transactions on Knowledge and Data Engineering","volume":"37 10","pages":"5719-5734"},"PeriodicalIF":10.4000,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Lightweight Learned Cardinality Estimation Model\",\"authors\":\"Yaoyu Zhu;Jintao Zhang;Guoliang Li;Jianhua Feng\",\"doi\":\"10.1109/TKDE.2025.3591025\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or take high inference latency. Simultaneously achieving high speed and accuracy becomes critical for the cardinality estimation problem. In this paper, we propose a novel data-driven approach called <italic>CoDe (Covering with Decompositions) to address this problem. <italic>CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments. For each segment, <italic>CoDe utilizes tensor decomposition to accurately model its data distribution. Moreover, <italic>CoDe introduces innovative algorithms to select the best-fitting distributions for each query, combining them to estimate the final result. By employing multiple models to approximate distributions, <italic>CoDe excels in effectively modeling discrete distributions and ensuring computational efficiency. Notably, experimental results show that our method represents a significant advancement in cardinality estimation, achieving state-of-the-art levels of both estimation accuracy and inference efficiency. Across various datasets, <italic>CoDe achieves absolute accuracy in estimating more than half of the queries.\",\"PeriodicalId\":13496,\"journal\":{\"name\":\"IEEE Transactions on Knowledge and Data Engineering\",\"volume\":\"37 10\",\"pages\":\"5719-5734\"},\"PeriodicalIF\":10.4000,\"publicationDate\":\"2025-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Knowledge and Data Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11086506/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Knowledge and Data Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11086506/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

基数估计是数据库管理系统中的一项基本任务，目的是在不执行查询的情况下准确预测查询结果。然而，现有技术要么估计精度低，要么推理延迟高。同时实现高速度和准确性成为基数估计问题的关键。在本文中，我们提出了一种新的数据驱动方法，称为CoDe（覆盖分解）来解决这个问题。CoDe采用覆盖设计的概念，将表划分为多个较小的重叠部分。对于每一段，CoDe利用张量分解对其数据分布进行精确建模。此外，CoDe引入了创新的算法来为每个查询选择最适合的分布，并将它们组合起来估计最终结果。通过使用多个模型来近似分布，CoDe在有效地建模离散分布和保证计算效率方面表现出色。值得注意的是，实验结果表明，我们的方法在基数估计方面取得了重大进展，在估计精度和推理效率方面都达到了最先进的水平。在不同的数据集上，CoDe在估计超过一半的查询时达到了绝对的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Lightweight Learned Cardinality Estimation Model

Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or take high inference latency. Simultaneously achieving high speed and accuracy becomes critical for the cardinality estimation problem. In this paper, we propose a novel data-driven approach called CoDe (Covering with Decompositions) to address this problem. CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments. For each segment, CoDe utilizes tensor decomposition to accurately model its data distribution. Moreover, CoDe introduces innovative algorithms to select the best-fitting distributions for each query, combining them to estimate the final result. By employing multiple models to approximate distributions, CoDe excels in effectively modeling discrete distributions and ensuring computational efficiency. Notably, experimental results show that our method represents a significant advancement in cardinality estimation, achieving state-of-the-art levels of both estimation accuracy and inference efficiency. Across various datasets, CoDe achieves absolute accuracy in estimating more than half of the queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Knowledge and Data Engineering 工程技术-工程：电子与电气

CiteScore

11.70

自引率

3.40%

发文量

515

审稿时长

6 months

期刊介绍： The IEEE Transactions on Knowledge and Data Engineering encompasses knowledge and data engineering aspects within computer science, artificial intelligence, electrical engineering, computer engineering, and related fields. It provides an interdisciplinary platform for disseminating new developments in knowledge and data engineering and explores the practicality of these concepts in both hardware and software. Specific areas covered include knowledge-based and expert systems, AI techniques for knowledge and data management, tools, and methodologies, distributed processing, real-time systems, architectures, data management practices, database design, query languages, security, fault tolerance, statistical databases, algorithms, performance evaluation, and applications.