Seyed Ali Jokar Jandaghi, Arnamoy Bhattacharyya, C. Amza
{"title":"Phase Annotated Learning for Apache Spark: Workload Recognition and Characterization","authors":"Seyed Ali Jokar Jandaghi, Arnamoy Bhattacharyya, C. Amza","doi":"10.1109/CloudCom2018.2018.00018","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce and evaluate a novel resource modeling technique for workload profiling, detection and resource usage prediction for Spark workloads. Specifically, we profile and annotate resource usage data in Spark with the application contexts where the resources were used. We then model the resource usage, per context, based on a Mixture of Gaussians (MOG) probabilistic distribution technique. When we recognize a similar workload, we can thus predict its resource usage for the contexts modeled a priori. In order to experimentally test the functionality of our Spark stage annotator and workload modeling tool we performed workload profiling for eight Apache Spark workloads. Our results show that, whenever a previously modeled workload is detected, our MOG models can be used to predict resource consumption with high accuracy.","PeriodicalId":365939,"journal":{"name":"2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom2018.2018.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper, we introduce and evaluate a novel resource modeling technique for workload profiling, detection and resource usage prediction for Spark workloads. Specifically, we profile and annotate resource usage data in Spark with the application contexts where the resources were used. We then model the resource usage, per context, based on a Mixture of Gaussians (MOG) probabilistic distribution technique. When we recognize a similar workload, we can thus predict its resource usage for the contexts modeled a priori. In order to experimentally test the functionality of our Spark stage annotator and workload modeling tool we performed workload profiling for eight Apache Spark workloads. Our results show that, whenever a previously modeled workload is detected, our MOG models can be used to predict resource consumption with high accuracy.