Forecasting a Storm: Divining Optimal Configurations using Genetic Algorithms and Supervised Learning

Michael Trotter, Timothy Wood, Jinho Hwang
{"title":"Forecasting a Storm: Divining Optimal Configurations using Genetic Algorithms and Supervised Learning","authors":"Michael Trotter, Timothy Wood, Jinho Hwang","doi":"10.1109/ICAC.2019.00025","DOIUrl":null,"url":null,"abstract":"With the advent of Big Data platforms like Apache Storm, computations once deemed infeasible locally become possible at scale. However, doing so entails orchestrating powerful yet expensive clusters. With its focus on stream processing, Storm optimizes for low-latency and high throughput. However, to realize this goal and thereby maximize the utility of these clusters' resources, operators must execute these tasks under their optimal configurations. Yet, the search space for finding such configurations is so vast and time-consuming to explore so as to be effectively intractable due to issues like the temporal overhead of testing new candidate configurations, the sheer number of permutations of parameters within each configuration and their interdependence among each other. In order to efficiently cover the search space, we automate the process with genetic algorithms. Moreover, we fuse this technique not only with additional cluster information gleaned from JMX profiling and Storm performance data but also with classifiers constructed from training data from past executions of a plethora of Storm topologies. Utilizing a diverse set of Storm benchmark topologies as evaluation data, we show that the fully enhanced genetic algorithms can efficiently find configurations that perform on average 4.67x better than \"rules of thumb\"-derived manual baselines. Moreover, we demonstrate that our fully refined classifiers enhance the GA throughput on average across the topologies by 22% while reducing search time by a factor of 6.47x.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Autonomic Computing (ICAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAC.2019.00025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

With the advent of Big Data platforms like Apache Storm, computations once deemed infeasible locally become possible at scale. However, doing so entails orchestrating powerful yet expensive clusters. With its focus on stream processing, Storm optimizes for low-latency and high throughput. However, to realize this goal and thereby maximize the utility of these clusters' resources, operators must execute these tasks under their optimal configurations. Yet, the search space for finding such configurations is so vast and time-consuming to explore so as to be effectively intractable due to issues like the temporal overhead of testing new candidate configurations, the sheer number of permutations of parameters within each configuration and their interdependence among each other. In order to efficiently cover the search space, we automate the process with genetic algorithms. Moreover, we fuse this technique not only with additional cluster information gleaned from JMX profiling and Storm performance data but also with classifiers constructed from training data from past executions of a plethora of Storm topologies. Utilizing a diverse set of Storm benchmark topologies as evaluation data, we show that the fully enhanced genetic algorithms can efficiently find configurations that perform on average 4.67x better than "rules of thumb"-derived manual baselines. Moreover, we demonstrate that our fully refined classifiers enhance the GA throughput on average across the topologies by 22% while reducing search time by a factor of 6.47x.
预测风暴:利用遗传算法和监督学习预测最优配置
随着像Apache Storm这样的大数据平台的出现,曾经被认为在本地不可行的计算变得可以大规模实现。然而,这样做需要编排功能强大但成本昂贵的集群。Storm专注于流处理,优化了低延迟和高吞吐量。然而,为了实现这一目标,从而最大限度地利用这些集群的资源,运营商必须在最优配置下执行这些任务。然而,由于测试新候选配置的时间开销、每个配置中参数排列的绝对数量以及它们之间的相互依赖性等问题,寻找这些配置的搜索空间是如此巨大和耗时,以至于很难有效地处理。为了有效地覆盖搜索空间,我们使用遗传算法将这一过程自动化。此外,我们不仅将该技术与从JMX分析和Storm性能数据中收集的额外集群信息融合在一起,还将该技术与从过去执行的大量Storm拓扑的训练数据中构建的分类器融合在一起。利用一组不同的Storm基准拓扑作为评估数据,我们表明,完全增强的遗传算法可以有效地找到比“经验法则”衍生的手动基线平均性能好4.67倍的配置。此外,我们证明了我们的完全改进的分类器在拓扑上平均提高了22%的GA吞吐量,同时将搜索时间减少了6.47倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信