Phase Annotated Learning for Apache Spark: Workload Recognition and Characterization

2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom) Pub Date : 2018-12-01 DOI:10.1109/CloudCom2018.2018.00018

Seyed Ali Jokar Jandaghi, Arnamoy Bhattacharyya, C. Amza

引用次数: 2

Abstract

In this paper, we introduce and evaluate a novel resource modeling technique for workload profiling, detection and resource usage prediction for Spark workloads. Specifically, we profile and annotate resource usage data in Spark with the application contexts where the resources were used. We then model the resource usage, per context, based on a Mixture of Gaussians (MOG) probabilistic distribution technique. When we recognize a similar workload, we can thus predict its resource usage for the contexts modeled a priori. In order to experimentally test the functionality of our Spark stage annotator and workload modeling tool we performed workload profiling for eight Apache Spark workloads. Our results show that, whenever a previously modeled workload is detected, our MOG models can be used to predict resource consumption with high accuracy.

查看原文本刊更多论文

Apache Spark的阶段标注学习:工作负载识别和表征

在本文中，我们介绍并评估了一种新的资源建模技术，用于Spark工作负载的工作负载分析、检测和资源使用预测。具体来说，我们将在使用资源的应用程序上下文中分析和注释Spark中的资源使用数据。然后，我们基于混合高斯(MOG)概率分布技术，对每个上下文的资源使用情况进行建模。当我们识别类似的工作负载时，我们就可以预测其对先验建模的上下文的资源使用情况。为了实验性地测试Spark阶段注释器和工作负载建模工具的功能，我们对8个Apache Spark工作负载执行了工作负载分析。我们的结果表明，无论何时检测到先前建模的工作负载，我们的MOG模型都可以用于高精度地预测资源消耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)

自引率

0.00%

发文量