Auto-tuning Spark big data workloads on POWER8: Prediction-based dynamic SMT threading

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) Pub Date : 2016-09-11 DOI:10.1145/2967938.2967957

Zhen Jia, Chao Xue, Guancheng Chen, Jianfeng Zhan, Lixin Zhang, Yonghua Lin, H. P. Hofstee

{"title":"Auto-tuning Spark big data workloads on POWER8: Prediction-based dynamic SMT threading","authors":"Zhen Jia, Chao Xue, Guancheng Chen, Jianfeng Zhan, Lixin Zhang, Yonghua Lin, H. P. Hofstee","doi":"10.1145/2967938.2967957","DOIUrl":null,"url":null,"abstract":"Much research work devotes to tuning big data analytics in modern data centers, since even a small percentage of performance improvement immediately translates to huge cost savings because of the large scale. Simultaneous multithreading (SMT) receives great interest from data center communities, as it has the potential to boost performance of big data analytics by increasing the processor resources utilization. For example, the emerging processor architectures like POWER8 support up to 8-way multithreading. However, as different big data workloads have disparate architectural characteristics, how to identify the most efficient SMT configuration to achieve the best performance is challenging in terms of both complex application behaviors and processor architectures. In this paper, we specifically focus on auto-tuning SMT configuration for Spark-based big data workloads on POWER8. However, our methodology could be generalized and extended to other programming software stacks and other architectures. We propose a prediction-based dynamic SMT threading (PBDST) framework to adjust the thread count in SMT cores on POWER8 processors by using versatile machine learning algorithms. Its innovation lies in adopting online SMT configuration predictions derived from microarchitecture level profiling, to regulate the thread counts that could achieve nearly optimal performance. Moreover it is implemented at Spark software stack layer and transparent to user applications. After evaluating a large set of machine learning algorithms, we choose the most efficient ones to perform online predictions. The experimental results demonstrate that our approach can achieve up to 56.3% performance improvement and an average performance gain of 16.2% in comparison with the default configuration-the maximum SMT configuration-SMT8 on our system.","PeriodicalId":407717,"journal":{"name":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2967938.2967957","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

Abstract

Much research work devotes to tuning big data analytics in modern data centers, since even a small percentage of performance improvement immediately translates to huge cost savings because of the large scale. Simultaneous multithreading (SMT) receives great interest from data center communities, as it has the potential to boost performance of big data analytics by increasing the processor resources utilization. For example, the emerging processor architectures like POWER8 support up to 8-way multithreading. However, as different big data workloads have disparate architectural characteristics, how to identify the most efficient SMT configuration to achieve the best performance is challenging in terms of both complex application behaviors and processor architectures. In this paper, we specifically focus on auto-tuning SMT configuration for Spark-based big data workloads on POWER8. However, our methodology could be generalized and extended to other programming software stacks and other architectures. We propose a prediction-based dynamic SMT threading (PBDST) framework to adjust the thread count in SMT cores on POWER8 processors by using versatile machine learning algorithms. Its innovation lies in adopting online SMT configuration predictions derived from microarchitecture level profiling, to regulate the thread counts that could achieve nearly optimal performance. Moreover it is implemented at Spark software stack layer and transparent to user applications. After evaluating a large set of machine learning algorithms, we choose the most efficient ones to perform online predictions. The experimental results demonstrate that our approach can achieve up to 56.3% performance improvement and an average performance gain of 16.2% in comparison with the default configuration-the maximum SMT configuration-SMT8 on our system.

查看原文本刊更多论文

POWER8上的Spark大数据工作负载自动调优:基于预测的动态SMT线程

许多研究工作致力于调整现代数据中心的大数据分析，因为即使是很小比例的性能改进也会立即转化为巨大的成本节约，因为规模很大。同步多线程(SMT)受到数据中心社区的极大兴趣，因为它有可能通过增加处理器资源利用率来提高大数据分析的性能。例如，新兴的处理器体系结构(如POWER8)最多支持8路多线程。然而，由于不同的大数据工作负载具有不同的体系结构特征，因此如何确定最有效的SMT配置以实现最佳性能在复杂的应用程序行为和处理器体系结构方面都具有挑战性。在本文中，我们特别关注POWER8上基于spark的大数据工作负载的自动调优SMT配置。然而，我们的方法可以推广和扩展到其他编程软件堆栈和其他架构。我们提出了一个基于预测的动态SMT线程(PBDST)框架，通过使用通用机器学习算法来调整POWER8处理器上SMT内核中的线程数。它的创新之处在于采用源自微架构级别分析的在线SMT配置预测，以调节可以实现近乎最佳性能的线程数。并且在Spark软件栈层实现，对用户应用透明。在评估了大量的机器学习算法之后，我们选择了最有效的算法来执行在线预测。实验结果表明，与默认配置(最大SMT配置)smt8相比，我们的方法可以实现高达56.3%的性能改进，平均性能增益为16.2%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

自引率

0.00%

发文量