A new tool for multi-level partitioning in teradata

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI:10.1145/2396761.2398604

Young-Kyoon Suh, A. Ghazal, A. Crolotte, Pekka Kostamaa

{"title":"A new tool for multi-level partitioning in teradata","authors":"Young-Kyoon Suh, A. Ghazal, A. Crolotte, Pekka Kostamaa","doi":"10.1145/2396761.2398604","DOIUrl":null,"url":null,"abstract":"This paper introduces a new tool that recommends an optimized partitioning solution called Multi-Level Partitioned Primary Index (MLPPI) for a fact table based on the queries in the workload. The tool implements a new technique using a greedy algorithm for search space enumeration. The space is driven by predicates in the queries. This technique fits very well the Teradata MLPPI scheme, as it is based on a general framework using general expressions, ranges and case expressions for partition definitions. The cost model implemented in the tool is based on the Teradata optimizer, and it is used to prune the search space for reaching a final solution. The tool resides completely on the client, and interfaces the database through APIs as opposed to previous work that requires optimizer code extension. The APIs are used to simplify the workload queries, and to capture fact table predicates and costs necessary to make the recommendation. The predicate-driven method implemented by the tool is general, and it can be applied to any clustering or partitioning scheme based on simple field expressions or complex SQL predicates. Experimental results given a particular workload will show that the recommendation from the tool outperforms a human expert. The experiments also show that the solution is scalable both with the workload complexity and the size of the fact table.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2396761.2398604","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

This paper introduces a new tool that recommends an optimized partitioning solution called Multi-Level Partitioned Primary Index (MLPPI) for a fact table based on the queries in the workload. The tool implements a new technique using a greedy algorithm for search space enumeration. The space is driven by predicates in the queries. This technique fits very well the Teradata MLPPI scheme, as it is based on a general framework using general expressions, ranges and case expressions for partition definitions. The cost model implemented in the tool is based on the Teradata optimizer, and it is used to prune the search space for reaching a final solution. The tool resides completely on the client, and interfaces the database through APIs as opposed to previous work that requires optimizer code extension. The APIs are used to simplify the workload queries, and to capture fact table predicates and costs necessary to make the recommendation. The predicate-driven method implemented by the tool is general, and it can be applied to any clustering or partitioning scheme based on simple field expressions or complex SQL predicates. Experimental results given a particular workload will show that the recommendation from the tool outperforms a human expert. The experiments also show that the solution is scalable both with the workload complexity and the size of the fact table.

查看原文本刊更多论文

在teradata中用于多级分区的新工具

本文介绍了一个新工具，它根据工作负载中的查询为事实表推荐了一种称为多级分区主索引(MLPPI)的优化分区解决方案。该工具使用贪婪算法实现了一种新的搜索空间枚举技术。该空间由查询中的谓词驱动。这种技术非常适合Teradata MLPPI方案，因为它基于使用通用表达式、范围和大小写表达式进行分区定义的通用框架。该工具中实现的成本模型基于Teradata优化器，它用于减少搜索空间，以获得最终解决方案。该工具完全驻留在客户机上，并通过api与数据库连接，而不是以前需要优化器代码扩展的工作。这些api用于简化工作负载查询，并捕获提出建议所需的事实表谓词和成本。该工具实现的谓词驱动方法是通用的，它可以应用于任何基于简单字段表达式或复杂SQL谓词的集群或分区方案。给定特定工作量的实验结果将表明，该工具的推荐优于人类专家。实验还表明，该解决方案在工作负载复杂性和事实表大小方面都具有可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21st ACM international conference on Information and knowledge management

自引率

0.00%

发文量