根据表的全外连接样本估算查询的卡片数

Informacionnye Tehnologii Pub Date : 2023-12-20 DOI:10.17587/it.29.650-663

U. A. Grigorev

{"title":"根据表的全外连接样本估算查询的卡片数","authors":"U. A. Grigorev","doi":"10.17587/it.29.650-663","DOIUrl":null,"url":null,"abstract":"Cardinality estimation (CardEst) plays an important role in creating high-quality query execution plans in the DBMS. In the last decade, a large number of methods have been developed: traditional methods (histograms, samples), machine learning methods based on queries or data. But all of them are based on different restrictions and assumptions, and the cardinality estimation with their help worsens with an increase in the number of joined tables. The article proposes two new methods based on the theory of approximate calculation of aggregates and allowing to remove most of the restrictions. Method 1 fetches blocks after executing the subqueries of the original query and does not require a preliminary analysis of the filter conditions. The condition for joining tables can be arbitrary (not necessarily the equality of attributes). Method 2 allows you to calculate the probabilities of reading blocks based on the metadata accumulated in the process of populating the database. Metadata takes up little memory, and the overhead of maintaining it is low. Method 2, in contrast to method 1, takes into account the true cardinality of the connection of neighboring blocks in the sample obtained from metadata. Therefore, the prospect opens up for a more accurate assessment of the cardinality of joining a large number of tables. Method 1 (EVACAR method) is implemented and compared with modern machine learning methods BayesCard, DeepDB, FLAT on a special STATS test. The results of the experiments confirmed the effectiveness of the EVACAR method. The EVACAR method is more accurate or its maximum q-error is comparable to machine learning methods for 75-88 % of evaluated queries (subplans). In the future, it is planned to implement the 2nd method for assessing the cardinality of queries.","PeriodicalId":504905,"journal":{"name":"Informacionnye Tehnologii","volume":"24 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating the Cardinality of Queries Based on a Sample from a Full Outer Join of Tables\",\"authors\":\"U. A. Grigorev\",\"doi\":\"10.17587/it.29.650-663\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cardinality estimation (CardEst) plays an important role in creating high-quality query execution plans in the DBMS. In the last decade, a large number of methods have been developed: traditional methods (histograms, samples), machine learning methods based on queries or data. But all of them are based on different restrictions and assumptions, and the cardinality estimation with their help worsens with an increase in the number of joined tables. The article proposes two new methods based on the theory of approximate calculation of aggregates and allowing to remove most of the restrictions. Method 1 fetches blocks after executing the subqueries of the original query and does not require a preliminary analysis of the filter conditions. The condition for joining tables can be arbitrary (not necessarily the equality of attributes). Method 2 allows you to calculate the probabilities of reading blocks based on the metadata accumulated in the process of populating the database. Metadata takes up little memory, and the overhead of maintaining it is low. Method 2, in contrast to method 1, takes into account the true cardinality of the connection of neighboring blocks in the sample obtained from metadata. Therefore, the prospect opens up for a more accurate assessment of the cardinality of joining a large number of tables. Method 1 (EVACAR method) is implemented and compared with modern machine learning methods BayesCard, DeepDB, FLAT on a special STATS test. The results of the experiments confirmed the effectiveness of the EVACAR method. The EVACAR method is more accurate or its maximum q-error is comparable to machine learning methods for 75-88 % of evaluated queries (subplans). In the future, it is planned to implement the 2nd method for assessing the cardinality of queries.\",\"PeriodicalId\":504905,\"journal\":{\"name\":\"Informacionnye Tehnologii\",\"volume\":\"24 7\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Informacionnye Tehnologii\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17587/it.29.650-663\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informacionnye Tehnologii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17587/it.29.650-663","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在数据库管理系统中创建高质量的查询执行计划时，卡品质估计（CardEst）发挥着重要作用。近十年来，人们开发了大量方法：传统方法（直方图、样本）、基于查询或数据的机器学习方法。但是，所有这些方法都基于不同的限制和假设，而且在这些方法的帮助下，随着连接表数量的增加，万有性估计也会恶化。本文提出了两种基于近似计算聚合理论的新方法，可以消除大部分限制。方法 1 在执行原始查询的子查询后获取数据块，不需要对过滤条件进行初步分析。连接表的条件可以是任意的（不一定是属性相等）。方法 2 可以根据数据库填充过程中积累的元数据计算读取数据块的概率。元数据占用的内存很少，维护元数据的开销也很低。与方法 1 不同的是，方法 2 考虑到了从元数据中获得的样本中相邻区块连接的真实万有引力。因此，可以更准确地评估连接大量表的冗余度。方法 1（EVACAR 方法）已经实现，并在一个特殊的 STATS 测试中与现代机器学习方法 BayesCard、DeepDB、FLAT 进行了比较。实验结果证实了 EVACAR 方法的有效性。在 75-88 % 的评估查询（子计划）中，EVACAR 方法更准确，或其最大 q 误差与机器学习方法相当。今后，我们计划采用第二种方法来评估查询的卡带性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Estimating the Cardinality of Queries Based on a Sample from a Full Outer Join of Tables

Cardinality estimation (CardEst) plays an important role in creating high-quality query execution plans in the DBMS. In the last decade, a large number of methods have been developed: traditional methods (histograms, samples), machine learning methods based on queries or data. But all of them are based on different restrictions and assumptions, and the cardinality estimation with their help worsens with an increase in the number of joined tables. The article proposes two new methods based on the theory of approximate calculation of aggregates and allowing to remove most of the restrictions. Method 1 fetches blocks after executing the subqueries of the original query and does not require a preliminary analysis of the filter conditions. The condition for joining tables can be arbitrary (not necessarily the equality of attributes). Method 2 allows you to calculate the probabilities of reading blocks based on the metadata accumulated in the process of populating the database. Metadata takes up little memory, and the overhead of maintaining it is low. Method 2, in contrast to method 1, takes into account the true cardinality of the connection of neighboring blocks in the sample obtained from metadata. Therefore, the prospect opens up for a more accurate assessment of the cardinality of joining a large number of tables. Method 1 (EVACAR method) is implemented and compared with modern machine learning methods BayesCard, DeepDB, FLAT on a special STATS test. The results of the experiments confirmed the effectiveness of the EVACAR method. The EVACAR method is more accurate or its maximum q-error is comparable to machine learning methods for 75-88 % of evaluated queries (subplans). In the future, it is planned to implement the 2nd method for assessing the cardinality of queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Informacionnye Tehnologii

自引率

0.00%

发文量