穿孔器:资源优化的雄辩性能模型

Proceedings of the Seventh ACM Symposium on Cloud Computing Pub Date : 2016-10-05 DOI:10.1145/2987550.2987566

K. Rajan, Dharmesh Kakadia, C. Curino, Subru Krishnan

{"title":"穿孔器:资源优化的雄辩性能模型","authors":"K. Rajan, Dharmesh Kakadia, C. Curino, Subru Krishnan","doi":"10.1145/2987550.2987566","DOIUrl":null,"url":null,"abstract":"Query Optimization focuses on finding the best query execution plan, given fixed hardware resources. In BigData settings, both pay-as-you-go clouds and on-prem shared clusters, a complementary challenge emerges: Resource Optimization: find the best hardware resources, given an execution plan. In this world, provisioning is almost instantaneous and time-varying resources can be acquired on a per-query basis. This allows us to optimize allocations for completion time, resource usage, dollar cost, etc. These optimizations have a huge impact on performance and cost, and pivot around a core challenge: faithful resource-to-performance models for arbitrary BigData queries. This task is challenging for users and tools alike due to lack of good statistics (high-velocity, unstructured data), frequent use of UDFs, impact on performance of different hardware types and a lack of understanding of parallel execution at such a scale. We address this with PerfOrator, a novel approach to resource-to-performance modeling. PerfOrator employs nonlinear regression on profile runs to model arbitrary UDFs, calibration queries to generalize across hardware platforms, and analytical framework models to account for parallelism. The resulting estimates are orders of magnitude more accurate than existing approaches (e.g, Hive's optimizer), and have been successfully employed in two resource optimization scenarios: 1) optimize provisioning of clusters in cloud settings---with decisions within 1% of optimal, 2) reserve skyline of resources for SLA jobs---with accuracies over 10x better than human experts.","PeriodicalId":362207,"journal":{"name":"Proceedings of the Seventh ACM Symposium on Cloud Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":"{\"title\":\"PerfOrator: eloquent performance models for Resource Optimization\",\"authors\":\"K. Rajan, Dharmesh Kakadia, C. Curino, Subru Krishnan\",\"doi\":\"10.1145/2987550.2987566\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Query Optimization focuses on finding the best query execution plan, given fixed hardware resources. In BigData settings, both pay-as-you-go clouds and on-prem shared clusters, a complementary challenge emerges: Resource Optimization: find the best hardware resources, given an execution plan. In this world, provisioning is almost instantaneous and time-varying resources can be acquired on a per-query basis. This allows us to optimize allocations for completion time, resource usage, dollar cost, etc. These optimizations have a huge impact on performance and cost, and pivot around a core challenge: faithful resource-to-performance models for arbitrary BigData queries. This task is challenging for users and tools alike due to lack of good statistics (high-velocity, unstructured data), frequent use of UDFs, impact on performance of different hardware types and a lack of understanding of parallel execution at such a scale. We address this with PerfOrator, a novel approach to resource-to-performance modeling. PerfOrator employs nonlinear regression on profile runs to model arbitrary UDFs, calibration queries to generalize across hardware platforms, and analytical framework models to account for parallelism. The resulting estimates are orders of magnitude more accurate than existing approaches (e.g, Hive's optimizer), and have been successfully employed in two resource optimization scenarios: 1) optimize provisioning of clusters in cloud settings---with decisions within 1% of optimal, 2) reserve skyline of resources for SLA jobs---with accuracies over 10x better than human experts.\",\"PeriodicalId\":362207,\"journal\":{\"name\":\"Proceedings of the Seventh ACM Symposium on Cloud Computing\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"64\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Seventh ACM Symposium on Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2987550.2987566\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Seventh ACM Symposium on Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2987550.2987566","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 64

摘要

查询优化的重点是在给定固定硬件资源的情况下找到最佳查询执行计划。在大数据设置中，无论是现收现付云和本地共享集群，都出现了一个互补的挑战:资源优化:在给定执行计划的情况下，找到最好的硬件资源。在这种情况下，供应几乎是即时的，可以在每个查询的基础上获取随时间变化的资源。这使我们能够优化完成时间、资源使用、成本等方面的分配。这些优化对性能和成本有巨大的影响，并且围绕着一个核心挑战:用于任意BigData查询的忠实的资源-性能模型。由于缺乏良好的统计数据(高速、非结构化数据)、频繁使用udf、对不同硬件类型性能的影响以及缺乏对这种规模的并行执行的理解，这项任务对用户和工具都具有挑战性。我们通过PerfOrator解决了这个问题，这是一种资源到性能建模的新方法。PerfOrator在配置文件运行中使用非线性回归来建模任意udf，使用校准查询来跨硬件平台进行推广，使用分析框架模型来考虑并行性。由此产生的估计比现有的方法(例如Hive的优化器)精确几个数量级，并且已经成功地应用于两个资源优化场景:1)优化云环境中的集群配置——决策在最优的1%以内;2)为SLA作业保留资源天线——精度比人类专家高10倍以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PerfOrator: eloquent performance models for Resource Optimization

Query Optimization focuses on finding the best query execution plan, given fixed hardware resources. In BigData settings, both pay-as-you-go clouds and on-prem shared clusters, a complementary challenge emerges: Resource Optimization: find the best hardware resources, given an execution plan. In this world, provisioning is almost instantaneous and time-varying resources can be acquired on a per-query basis. This allows us to optimize allocations for completion time, resource usage, dollar cost, etc. These optimizations have a huge impact on performance and cost, and pivot around a core challenge: faithful resource-to-performance models for arbitrary BigData queries. This task is challenging for users and tools alike due to lack of good statistics (high-velocity, unstructured data), frequent use of UDFs, impact on performance of different hardware types and a lack of understanding of parallel execution at such a scale. We address this with PerfOrator, a novel approach to resource-to-performance modeling. PerfOrator employs nonlinear regression on profile runs to model arbitrary UDFs, calibration queries to generalize across hardware platforms, and analytical framework models to account for parallelism. The resulting estimates are orders of magnitude more accurate than existing approaches (e.g, Hive's optimizer), and have been successfully employed in two resource optimization scenarios: 1) optimize provisioning of clusters in cloud settings---with decisions within 1% of optimal, 2) reserve skyline of resources for SLA jobs---with accuracies over 10x better than human experts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Seventh ACM Symposium on Cloud Computing

自引率

0.00%

发文量