Adaptive Optimization of Very Large Join Queries

Proceedings of the 2018 International Conference on Management of Data Pub Date : 2018-05-27 DOI:10.1145/3183713.3183733

Thomas Neumann, Bernhard Radke

{"title":"Adaptive Optimization of Very Large Join Queries","authors":"Thomas Neumann, Bernhard Radke","doi":"10.1145/3183713.3183733","DOIUrl":null,"url":null,"abstract":"The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.","PeriodicalId":20430,"journal":{"name":"Proceedings of the 2018 International Conference on Management of Data","volume":"42 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3183713.3183733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 52

Abstract

The use of business intelligence tools and other means to generate queries has led to great variety in the size of join queries. While most queries are reasonably small, join queries with up to a hundred relations are not that exotic anymore, and the distribution of query sizes has an incredible long tail. The largest real-world query that we are aware of accesses more than 4,000 relations. This large spread makes query optimization very challenging. Join ordering is known to be NP-hard, which means that we cannot hope to solve such large problems exactly. On the other hand most queries are much smaller, and there is no reason to sacrifice optimality there. This paper introduces an adaptive optimization framework that is able to solve most common join queries exactly, while simultaneously scaling to queries with thousands of joins. A key component there is a novel search space linearization technique that leads to near-optimal execution plans for large classes of queries. In addition, we describe implementation techniques that are necessary to scale join ordering algorithms to these extremely large queries. Extensive experiments with over 10 different approaches show that the new adaptive approach proposed here performs excellent over a huge spectrum of query sizes, and produces optimal or near-optimal solutions for most common queries.

查看原文本刊更多论文

超大型连接查询的自适应优化

使用商业智能工具和其他方法生成查询导致了连接查询大小的巨大变化。虽然大多数查询都相当小，但具有多达100个关系的连接查询不再那么奇特，并且查询大小的分布具有令人难以置信的长尾。我们所知道的最大的现实世界查询访问了超过4000个关系。这种巨大的分布使得查询优化非常具有挑战性。众所周知，连接排序是np困难的，这意味着我们不能指望精确地解决如此大的问题。另一方面，大多数查询都要小得多，没有理由牺牲最优性。本文介绍了一个自适应优化框架，它能够准确地解决最常见的连接查询，同时扩展到具有数千个连接的查询。其中一个关键组件是一种新颖的搜索空间线性化技术，它可以为大型查询类提供近乎最佳的执行计划。此外，我们还描述了将连接排序算法扩展到这些超大型查询所需的实现技术。对超过10种不同方法的大量实验表明，本文提出的新的自适应方法在查询大小的巨大范围内表现出色，并为大多数常见查询产生最优或接近最优的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 International Conference on Management of Data

自引率

0.00%

发文量