Forecasting a Storm: Divining Optimal Configurations using Genetic Algorithms and Supervised Learning

2019 IEEE International Conference on Autonomic Computing (ICAC) Pub Date : 2019-06-01 DOI:10.1109/ICAC.2019.00025

Michael Trotter, Timothy Wood, Jinho Hwang

{"title":"Forecasting a Storm: Divining Optimal Configurations using Genetic Algorithms and Supervised Learning","authors":"Michael Trotter, Timothy Wood, Jinho Hwang","doi":"10.1109/ICAC.2019.00025","DOIUrl":null,"url":null,"abstract":"With the advent of Big Data platforms like Apache Storm, computations once deemed infeasible locally become possible at scale. However, doing so entails orchestrating powerful yet expensive clusters. With its focus on stream processing, Storm optimizes for low-latency and high throughput. However, to realize this goal and thereby maximize the utility of these clusters' resources, operators must execute these tasks under their optimal configurations. Yet, the search space for finding such configurations is so vast and time-consuming to explore so as to be effectively intractable due to issues like the temporal overhead of testing new candidate configurations, the sheer number of permutations of parameters within each configuration and their interdependence among each other. In order to efficiently cover the search space, we automate the process with genetic algorithms. Moreover, we fuse this technique not only with additional cluster information gleaned from JMX profiling and Storm performance data but also with classifiers constructed from training data from past executions of a plethora of Storm topologies. Utilizing a diverse set of Storm benchmark topologies as evaluation data, we show that the fully enhanced genetic algorithms can efficiently find configurations that perform on average 4.67x better than \"rules of thumb\"-derived manual baselines. Moreover, we demonstrate that our fully refined classifiers enhance the GA throughput on average across the topologies by 22% while reducing search time by a factor of 6.47x.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Autonomic Computing (ICAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAC.2019.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

With the advent of Big Data platforms like Apache Storm, computations once deemed infeasible locally become possible at scale. However, doing so entails orchestrating powerful yet expensive clusters. With its focus on stream processing, Storm optimizes for low-latency and high throughput. However, to realize this goal and thereby maximize the utility of these clusters' resources, operators must execute these tasks under their optimal configurations. Yet, the search space for finding such configurations is so vast and time-consuming to explore so as to be effectively intractable due to issues like the temporal overhead of testing new candidate configurations, the sheer number of permutations of parameters within each configuration and their interdependence among each other. In order to efficiently cover the search space, we automate the process with genetic algorithms. Moreover, we fuse this technique not only with additional cluster information gleaned from JMX profiling and Storm performance data but also with classifiers constructed from training data from past executions of a plethora of Storm topologies. Utilizing a diverse set of Storm benchmark topologies as evaluation data, we show that the fully enhanced genetic algorithms can efficiently find configurations that perform on average 4.67x better than "rules of thumb"-derived manual baselines. Moreover, we demonstrate that our fully refined classifiers enhance the GA throughput on average across the topologies by 22% while reducing search time by a factor of 6.47x.

查看原文本刊更多论文

预测风暴:利用遗传算法和监督学习预测最优配置

随着像Apache Storm这样的大数据平台的出现，曾经被认为在本地不可行的计算变得可以大规模实现。然而，这样做需要编排功能强大但成本昂贵的集群。Storm专注于流处理，优化了低延迟和高吞吐量。然而，为了实现这一目标，从而最大限度地利用这些集群的资源，运营商必须在最优配置下执行这些任务。然而，由于测试新候选配置的时间开销、每个配置中参数排列的绝对数量以及它们之间的相互依赖性等问题，寻找这些配置的搜索空间是如此巨大和耗时，以至于很难有效地处理。为了有效地覆盖搜索空间，我们使用遗传算法将这一过程自动化。此外，我们不仅将该技术与从JMX分析和Storm性能数据中收集的额外集群信息融合在一起，还将该技术与从过去执行的大量Storm拓扑的训练数据中构建的分类器融合在一起。利用一组不同的Storm基准拓扑作为评估数据，我们表明，完全增强的遗传算法可以有效地找到比“经验法则”衍生的手动基线平均性能好4.67倍的配置。此外，我们证明了我们的完全改进的分类器在拓扑上平均提高了22%的GA吞吐量，同时将搜索时间减少了6.47倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Conference on Autonomic Computing (ICAC)

自引率

0.00%

发文量