Phase Annotated Learning for Apache Spark: Workload Recognition and Characterization

Seyed Ali Jokar Jandaghi, Arnamoy Bhattacharyya, C. Amza
{"title":"Phase Annotated Learning for Apache Spark: Workload Recognition and Characterization","authors":"Seyed Ali Jokar Jandaghi, Arnamoy Bhattacharyya, C. Amza","doi":"10.1109/CloudCom2018.2018.00018","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce and evaluate a novel resource modeling technique for workload profiling, detection and resource usage prediction for Spark workloads. Specifically, we profile and annotate resource usage data in Spark with the application contexts where the resources were used. We then model the resource usage, per context, based on a Mixture of Gaussians (MOG) probabilistic distribution technique. When we recognize a similar workload, we can thus predict its resource usage for the contexts modeled a priori. In order to experimentally test the functionality of our Spark stage annotator and workload modeling tool we performed workload profiling for eight Apache Spark workloads. Our results show that, whenever a previously modeled workload is detected, our MOG models can be used to predict resource consumption with high accuracy.","PeriodicalId":365939,"journal":{"name":"2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom2018.2018.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

In this paper, we introduce and evaluate a novel resource modeling technique for workload profiling, detection and resource usage prediction for Spark workloads. Specifically, we profile and annotate resource usage data in Spark with the application contexts where the resources were used. We then model the resource usage, per context, based on a Mixture of Gaussians (MOG) probabilistic distribution technique. When we recognize a similar workload, we can thus predict its resource usage for the contexts modeled a priori. In order to experimentally test the functionality of our Spark stage annotator and workload modeling tool we performed workload profiling for eight Apache Spark workloads. Our results show that, whenever a previously modeled workload is detected, our MOG models can be used to predict resource consumption with high accuracy.
Apache Spark的阶段标注学习:工作负载识别和表征
在本文中,我们介绍并评估了一种新的资源建模技术,用于Spark工作负载的工作负载分析、检测和资源使用预测。具体来说,我们将在使用资源的应用程序上下文中分析和注释Spark中的资源使用数据。然后,我们基于混合高斯(MOG)概率分布技术,对每个上下文的资源使用情况进行建模。当我们识别类似的工作负载时,我们就可以预测其对先验建模的上下文的资源使用情况。为了实验性地测试Spark阶段注释器和工作负载建模工具的功能,我们对8个Apache Spark工作负载执行了工作负载分析。我们的结果表明,无论何时检测到先前建模的工作负载,我们的MOG模型都可以用于高精度地预测资源消耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信