基于机器学习的基数估计的混合谓词组合查询的增强特征

Magnus Müller, Lucas Woltmann, Wolfgang Lehner
{"title":"基于机器学习的基数估计的混合谓词组合查询的增强特征","authors":"Magnus Müller, Lucas Woltmann, Wolfgang Lehner","doi":"10.48786/edbt.2023.22","DOIUrl":null,"url":null,"abstract":"Estimating query result sizes is a critical task in areas like query optimization. For some years now it has been popular to apply machine learning to this problem. However, surprisingly, there has been very little research yet on how to present queries to a machine learning model. Machine learning models do not simply consume SQL strings. Instead, a SQL string is transformed into a numerical representation. This transformation is called query featurization and is defined by a query featurization technique (QFT). This paper is concerned with QFTs for queries with many selection predicates. In particular, we consider queries that contain both predicates over different attributes and multiple predicates per attribute. We identify a desired property of query featurization and present three novel QFTs. To the best of our knowledge, we are the first to featurize queries with mixed combinations of predicates, i.e., containing both conjunctions and disjunctions. Our QFTs are model-independent and can serve as the query featurization layer for different machine learning model types. In our evaluation, we combine our QFTs with three different machine learning models. We demonstrate that the estimation accuracy of machine learning models significantly depends on the QFT used. In addition, we compare our best combination of QFT and machine learning model to various existing cardinality estimators.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"17 1","pages":"273-284"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Enhanced Featurization of Queries with Mixed Combinations of Predicates for ML-based Cardinality Estimation\",\"authors\":\"Magnus Müller, Lucas Woltmann, Wolfgang Lehner\",\"doi\":\"10.48786/edbt.2023.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating query result sizes is a critical task in areas like query optimization. For some years now it has been popular to apply machine learning to this problem. However, surprisingly, there has been very little research yet on how to present queries to a machine learning model. Machine learning models do not simply consume SQL strings. Instead, a SQL string is transformed into a numerical representation. This transformation is called query featurization and is defined by a query featurization technique (QFT). This paper is concerned with QFTs for queries with many selection predicates. In particular, we consider queries that contain both predicates over different attributes and multiple predicates per attribute. We identify a desired property of query featurization and present three novel QFTs. To the best of our knowledge, we are the first to featurize queries with mixed combinations of predicates, i.e., containing both conjunctions and disjunctions. Our QFTs are model-independent and can serve as the query featurization layer for different machine learning model types. In our evaluation, we combine our QFTs with three different machine learning models. We demonstrate that the estimation accuracy of machine learning models significantly depends on the QFT used. In addition, we compare our best combination of QFT and machine learning model to various existing cardinality estimators.\",\"PeriodicalId\":88813,\"journal\":{\"name\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"volume\":\"17 1\",\"pages\":\"273-284\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in database technology : proceedings. International Conference on Extending Database Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48786/edbt.2023.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2023.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

估计查询结果大小在查询优化等领域是一项关键任务。几年来,将机器学习应用于这个问题已经很流行了。然而,令人惊讶的是,关于如何向机器学习模型呈现查询的研究很少。机器学习模型并不是简单地使用SQL字符串。相反,SQL字符串被转换成数字表示形式。这种转换称为查询特征化,由查询特征化技术(QFT)定义。本文研究具有许多选择谓词的查询的qft。特别是,我们考虑的查询既包含不同属性上的谓词,也包含每个属性上的多个谓词。我们确定了查询特征化的一个期望属性,并提出了三个新的qft。据我们所知,我们是第一个使用混合谓词组合的查询,即同时包含连词和析词的查询。我们的qft是模型独立的,可以作为不同机器学习模型类型的查询特征层。在我们的评估中,我们将qft与三种不同的机器学习模型结合起来。我们证明了机器学习模型的估计精度在很大程度上取决于所使用的QFT。此外,我们将QFT和机器学习模型的最佳组合与各种现有的基数估计器进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhanced Featurization of Queries with Mixed Combinations of Predicates for ML-based Cardinality Estimation
Estimating query result sizes is a critical task in areas like query optimization. For some years now it has been popular to apply machine learning to this problem. However, surprisingly, there has been very little research yet on how to present queries to a machine learning model. Machine learning models do not simply consume SQL strings. Instead, a SQL string is transformed into a numerical representation. This transformation is called query featurization and is defined by a query featurization technique (QFT). This paper is concerned with QFTs for queries with many selection predicates. In particular, we consider queries that contain both predicates over different attributes and multiple predicates per attribute. We identify a desired property of query featurization and present three novel QFTs. To the best of our knowledge, we are the first to featurize queries with mixed combinations of predicates, i.e., containing both conjunctions and disjunctions. Our QFTs are model-independent and can serve as the query featurization layer for different machine learning model types. In our evaluation, we combine our QFTs with three different machine learning models. We demonstrate that the estimation accuracy of machine learning models significantly depends on the QFT used. In addition, we compare our best combination of QFT and machine learning model to various existing cardinality estimators.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信