Christoph Anneser, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus
{"title":"AutoSteer: Learned Query Optimization for Any SQL Database","authors":"Christoph Anneser, Nesime Tatbul, David Cohen, Zhenggang Xu, Prithviraj Pandian, Nikolay Laptev, Ryan Marcus","doi":"10.14778/3611540.3611544","DOIUrl":null,"url":null,"abstract":"This paper presents AutoSteer, a learning-based solution that automatically drives query optimization in any SQL database that exposes tunable optimizer knobs. AutoSteer builds on the Bandit optimizer (Bao) and extends it with new capabilities (e.g., automated hint-set discovery) to minimize integration effort and facilitate usability in both monolithic and disaggregated SQL systems. We successfully applied AutoSteer on PostgreSQL, PrestoDB, Spark-SQL, MySQL, and DuckDB - five popular open-source database engines with diverse query optimizers. We then conducted a detailed experimental evaluation with public benchmarks (JOB, Stackoverflow, TPC-DS) and a production workload from Meta's PrestoDB deployments. Our evaluation shows that AutoSteer can not only outperform these engines' native query optimizers (e.g., up to 40% improvements for PrestoDB) but can also match the performance of Bao-for-PostgreSQL with reduced human supervision and increased adaptivity, as it replaces Bao's static, expert-picked hint-sets with those that are automatically discovered. We also provide an open-source implementation of AutoSteer together with a visual tool for interactive use by query optimization experts.","PeriodicalId":54220,"journal":{"name":"Proceedings of the Vldb Endowment","volume":"82 1","pages":"0"},"PeriodicalIF":2.6000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Vldb Endowment","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14778/3611540.3611544","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents AutoSteer, a learning-based solution that automatically drives query optimization in any SQL database that exposes tunable optimizer knobs. AutoSteer builds on the Bandit optimizer (Bao) and extends it with new capabilities (e.g., automated hint-set discovery) to minimize integration effort and facilitate usability in both monolithic and disaggregated SQL systems. We successfully applied AutoSteer on PostgreSQL, PrestoDB, Spark-SQL, MySQL, and DuckDB - five popular open-source database engines with diverse query optimizers. We then conducted a detailed experimental evaluation with public benchmarks (JOB, Stackoverflow, TPC-DS) and a production workload from Meta's PrestoDB deployments. Our evaluation shows that AutoSteer can not only outperform these engines' native query optimizers (e.g., up to 40% improvements for PrestoDB) but can also match the performance of Bao-for-PostgreSQL with reduced human supervision and increased adaptivity, as it replaces Bao's static, expert-picked hint-sets with those that are automatically discovered. We also provide an open-source implementation of AutoSteer together with a visual tool for interactive use by query optimization experts.
本文介绍了AutoSteer,这是一种基于学习的解决方案,可以在任何SQL数据库中自动驱动可调优化器旋钮的查询优化。AutoSteer建立在Bandit优化器(Bao)的基础上,并扩展了它的新功能(例如,自动提示集发现),以最大限度地减少集成工作,并促进单片和分解SQL系统的可用性。我们成功地将AutoSteer应用于PostgreSQL、PrestoDB、Spark-SQL、MySQL和DuckDB这五种流行的开源数据库引擎,它们具有不同的查询优化器。然后,我们使用公共基准测试(JOB、Stackoverflow、TPC-DS)和Meta PrestoDB部署的生产工作负载进行了详细的实验评估。我们的评估表明,AutoSteer不仅可以胜过这些引擎的原生查询优化器(例如,PrestoDB的性能提高了40%),而且还可以在减少人工监督和提高适应性的情况下与Bao for postgresql的性能相匹配,因为它用自动发现的提示集取代了Bao的静态、专家挑选的提示集。我们还提供了AutoSteer的开源实现以及一个可视化工具,供查询优化专家进行交互使用。
期刊介绍:
The Proceedings of the VLDB (PVLDB) welcomes original research papers on a broad range of research topics related to all aspects of data management, where systems issues play a significant role, such as data management system technology and information management infrastructures, including their very large scale of experimentation, novel architectures, and demanding applications as well as their underpinning theory. The scope of a submission for PVLDB is also described by the subject areas given below. Moreover, the scope of PVLDB is restricted to scientific areas that are covered by the combined expertise on the submission’s topic of the journal’s editorial board. Finally, the submission’s contributions should build on work already published in data management outlets, e.g., PVLDB, VLDBJ, ACM SIGMOD, IEEE ICDE, EDBT, ACM TODS, IEEE TKDE, and go beyond a syntactic citation.