Otterman: A Novel Approach of Spark Auto-tuning by a Hybrid Strategy

Haizhou Du, Ping Han, Wei Chen, Yi Wang, Chenlu Zhang
{"title":"Otterman: A Novel Approach of Spark Auto-tuning by a Hybrid Strategy","authors":"Haizhou Du, Ping Han, Wei Chen, Yi Wang, Chenlu Zhang","doi":"10.1109/ICSAI.2018.8599304","DOIUrl":null,"url":null,"abstract":"Spark has become a very attractive platform for big data analytics in recent years due to its unique advantages such as parallelism, fault tolerance, and complexity associated with clusters setup. On the spark platform, users can adjust parameter configurations according to different job requirements and specific applications to optimize performance. This leads to a problem that we can’t ignore, Spark already has more than 180 parameters, and its huge combination of parameters means that we can’t rely on manual tuning to grasp the impact of all parameters on performance. In order to solve the problem of relying heavily on expert experience and manual operation, we propose Otterman, a parameters optimization approach based on the combination of Simulated Annealing algorithm and Least Squares method, which can help us dynamically adjust parameters according to job types to obtain optimal configuration to improve performance. Simulated Annealing can find the optimal solution, but has poor convergence. We make use of the Least Squares method to effectively improve the speed at which the former converges to the optimal solution. Otterman is simple and easy to perform, with no additional cost. The effectiveness of the approach is verified by experiments, the results show that Otterman’s average performance has increased by 30% compared to the default parameters configuration, with an accuracy of about 68%.","PeriodicalId":375852,"journal":{"name":"2018 5th International Conference on Systems and Informatics (ICSAI)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th International Conference on Systems and Informatics (ICSAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2018.8599304","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Spark has become a very attractive platform for big data analytics in recent years due to its unique advantages such as parallelism, fault tolerance, and complexity associated with clusters setup. On the spark platform, users can adjust parameter configurations according to different job requirements and specific applications to optimize performance. This leads to a problem that we can’t ignore, Spark already has more than 180 parameters, and its huge combination of parameters means that we can’t rely on manual tuning to grasp the impact of all parameters on performance. In order to solve the problem of relying heavily on expert experience and manual operation, we propose Otterman, a parameters optimization approach based on the combination of Simulated Annealing algorithm and Least Squares method, which can help us dynamically adjust parameters according to job types to obtain optimal configuration to improve performance. Simulated Annealing can find the optimal solution, but has poor convergence. We make use of the Least Squares method to effectively improve the speed at which the former converges to the optimal solution. Otterman is simple and easy to perform, with no additional cost. The effectiveness of the approach is verified by experiments, the results show that Otterman’s average performance has increased by 30% compared to the default parameters configuration, with an accuracy of about 68%.
Otterman:一种基于混合策略的Spark自动调谐新方法
近年来,由于其独特的优势,如并行性、容错性和与集群设置相关的复杂性,Spark已经成为一个非常有吸引力的大数据分析平台。在spark平台上,用户可以根据不同的工作要求和特定的应用调整参数配置,以优化性能。这导致了一个我们不能忽视的问题,Spark已经有超过180个参数,其庞大的参数组合意味着我们不能依靠手动调优来掌握所有参数对性能的影响。为了解决严重依赖专家经验和人工操作的问题,我们提出了一种基于模拟退火算法和最小二乘法相结合的参数优化方法Otterman,它可以帮助我们根据作业类型动态调整参数以获得最优配置,从而提高性能。模拟退火可以找到最优解,但收敛性较差。我们利用最小二乘法有效地提高了前者收敛到最优解的速度。Otterman操作简单,无需额外费用。通过实验验证了该方法的有效性,结果表明,与默认参数配置相比,Otterman的平均性能提高了30%,准确率约为68%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信