Improving Data-Analytics Performance Via Autonomic Control of Concurrency and Resource Units

ACM Transactions on Autonomous and Adaptive Systems (TAAS) Pub Date : 2019-03-15 DOI:10.1145/3309539

Gil Jae Lee, J. Fortes

{"title":"Improving Data-Analytics Performance Via Autonomic Control of Concurrency and Resource Units","authors":"Gil Jae Lee, J. Fortes","doi":"10.1145/3309539","DOIUrl":null,"url":null,"abstract":"Many big-data processing jobs use data-analytics frameworks such as Apache Hadoop (currently also known as YARN). Such frameworks have tunable configuration parameters set by experienced system administrators and/or job developers. However, tuning parameters manually can be hard and time-consuming because it requires domain-specific knowledge and understanding of complex inter-dependencies among parameters. Most of the frameworks seek efficient resource management by assigning resource units to jobs, the maximum number of units allowed in a system being part of the static configuration of the system. This static resource management has limited effectiveness in coping with job diversity and workload dynamics, even in the case of a single job. The work reported in this article seeks to improve performance (e.g., multiple-jobs makespan and job completion time) without modification of either the framework or the applications and avoiding problems of previous self-tuning approaches based on performance models or resource usage. These problems include (1) the need for time-consuming training, typically offline and (2) unsuitability for multi-jobs/tenant environments. This article proposes a hierarchical self-tuning approach using (1) a fuzzy-logic controller to dynamically adjust the maximum number of concurrent jobs and (2) additional controllers (one for each cluster node) to adjust the maximum number of resource units assigned to jobs on each node. The fuzzy-logic controller uses fuzzy rules based on a concave-downward relationship between aggregate CPU usage and the number of concurrent jobs. The other controllers use a heuristic algorithm to adjust the number of resource units on the basis of both CPU and disk IO usage by jobs. To manage the maximum number of available resource units in each node, the controllers also take resource usage by other processes (e.g., system processes) into account. A prototype of our approach was implemented for Apache Hadoop on a cluster running at CloudLab. The proposed approach was demonstrated and evaluated with workloads composed of jobs with similar resource usage patterns as well as other realistic mixed-pattern workloads synthesized by SWIM, a statistical workload injector for MapReduce. The evaluation shows that the proposed approach yields up to a 48% reduction of the jobs makespan that results from using Hadoop-default settings.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3309539","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Many big-data processing jobs use data-analytics frameworks such as Apache Hadoop (currently also known as YARN). Such frameworks have tunable configuration parameters set by experienced system administrators and/or job developers. However, tuning parameters manually can be hard and time-consuming because it requires domain-specific knowledge and understanding of complex inter-dependencies among parameters. Most of the frameworks seek efficient resource management by assigning resource units to jobs, the maximum number of units allowed in a system being part of the static configuration of the system. This static resource management has limited effectiveness in coping with job diversity and workload dynamics, even in the case of a single job. The work reported in this article seeks to improve performance (e.g., multiple-jobs makespan and job completion time) without modification of either the framework or the applications and avoiding problems of previous self-tuning approaches based on performance models or resource usage. These problems include (1) the need for time-consuming training, typically offline and (2) unsuitability for multi-jobs/tenant environments. This article proposes a hierarchical self-tuning approach using (1) a fuzzy-logic controller to dynamically adjust the maximum number of concurrent jobs and (2) additional controllers (one for each cluster node) to adjust the maximum number of resource units assigned to jobs on each node. The fuzzy-logic controller uses fuzzy rules based on a concave-downward relationship between aggregate CPU usage and the number of concurrent jobs. The other controllers use a heuristic algorithm to adjust the number of resource units on the basis of both CPU and disk IO usage by jobs. To manage the maximum number of available resource units in each node, the controllers also take resource usage by other processes (e.g., system processes) into account. A prototype of our approach was implemented for Apache Hadoop on a cluster running at CloudLab. The proposed approach was demonstrated and evaluated with workloads composed of jobs with similar resource usage patterns as well as other realistic mixed-pattern workloads synthesized by SWIM, a statistical workload injector for MapReduce. The evaluation shows that the proposed approach yields up to a 48% reduction of the jobs makespan that results from using Hadoop-default settings.

查看原文本刊更多论文

通过并发性和资源单元的自主控制提高数据分析性能

许多大数据处理工作使用数据分析框架，如Apache Hadoop(目前也称为YARN)。这些框架具有由经验丰富的系统管理员和/或作业开发人员设置的可调配置参数。然而，手动调优参数可能是困难且耗时的，因为它需要特定于领域的知识和对参数之间复杂相互依赖关系的理解。大多数框架通过为作业分配资源单元来寻求有效的资源管理，系统中允许的最大单元数量是系统静态配置的一部分。这种静态资源管理在处理工作多样性和工作负载动态方面的有效性有限，即使在单个工作的情况下也是如此。本文中报告的工作旨在在不修改框架或应用程序的情况下提高性能(例如，多作业的makespan和作业完成时间)，并避免以前基于性能模型或资源使用的自调优方法的问题。这些问题包括:(1)需要耗时的培训，通常是离线的;(2)不适合多作业/租户环境。本文提出了一种分层自调优方法，使用(1)模糊逻辑控制器来动态调整并发作业的最大数量，(2)附加控制器(每个集群节点一个)来调整分配给每个节点上作业的资源单元的最大数量。模糊控制器是一种基于CPU总占用率与并发任务数呈下凹关系的模糊规则。其他控制器使用启发式算法根据作业对CPU和磁盘IO的使用情况来调整资源单元的数量。为了管理每个节点中可用资源单元的最大数量，控制器还会考虑其他进程(例如系统进程)的资源使用情况。我们的方法的原型是在CloudLab运行的Apache Hadoop集群上实现的。该方法通过由具有相似资源使用模式的作业组成的工作负载以及由MapReduce的统计工作负载注入器SWIM合成的其他实际混合模式工作负载进行了演示和评估。评估表明，通过使用hadoop默认设置，所建议的方法最多可将作业完工时间减少48%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Autonomous and Adaptive Systems (TAAS)

自引率

0.00%

发文量