I/O Performance Modeling for Big Data Applications over Cloud Infrastructures

2015 IEEE International Conference on Cloud Engineering Pub Date : 2015-03-09 DOI:10.1109/IC2E.2015.29

Ioannis Mytilinis, Dimitrios Tsoumakos, Verena Kantere, Anastassios Nanos, N. Koziris

{"title":"I/O Performance Modeling for Big Data Applications over Cloud Infrastructures","authors":"Ioannis Mytilinis, Dimitrios Tsoumakos, Verena Kantere, Anastassios Nanos, N. Koziris","doi":"10.1109/IC2E.2015.29","DOIUrl":null,"url":null,"abstract":"Big Data applications receive an ever-increasing amount of attention, thus becoming a dominant class of applications that are deployed over virtualized environments. Cloud environments entail a large amount of complexity relative to I/O performance. The use of Big Data increases the complexity of I/O management as well as its characterization and prediction: As I/O operations become growingly dominant in such applications, the intricacies of virtualization, different storage back ends and deployment setups significantly hinder our ability to analyze and correctly predict I/O performance. To that end, this work proposes an end-to-end modeling technique to predict performance of I/O--intensive Big Data applications running over cloud infrastructures. We develop a model tuned over application and infrastructure dimensions: Primitive I/O operations, data access patterns, storage back ends and deployment parameters. The trained model can be used to predict both I/O but also general task performance. Our evaluation results show that for jobs which are dominated by I/O operations, such as I/O-bound MapReduce jobs, our model is capable of predicting execution time with an accuracy close to 90% that decreases as application processing becomes more complex.","PeriodicalId":395715,"journal":{"name":"2015 IEEE International Conference on Cloud Engineering","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cloud Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IC2E.2015.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Big Data applications receive an ever-increasing amount of attention, thus becoming a dominant class of applications that are deployed over virtualized environments. Cloud environments entail a large amount of complexity relative to I/O performance. The use of Big Data increases the complexity of I/O management as well as its characterization and prediction: As I/O operations become growingly dominant in such applications, the intricacies of virtualization, different storage back ends and deployment setups significantly hinder our ability to analyze and correctly predict I/O performance. To that end, this work proposes an end-to-end modeling technique to predict performance of I/O--intensive Big Data applications running over cloud infrastructures. We develop a model tuned over application and infrastructure dimensions: Primitive I/O operations, data access patterns, storage back ends and deployment parameters. The trained model can be used to predict both I/O but also general task performance. Our evaluation results show that for jobs which are dominated by I/O operations, such as I/O-bound MapReduce jobs, our model is capable of predicting execution time with an accuracy close to 90% that decreases as application processing becomes more complex.

查看原文本刊更多论文

基于云基础设施的大数据应用I/O性能建模

大数据应用受到越来越多的关注，因此成为部署在虚拟化环境上的主要应用类别。相对于I/O性能，云环境带来了大量的复杂性。大数据的使用增加了I/O管理及其特征和预测的复杂性:随着I/O操作在此类应用程序中越来越占主导地位，虚拟化、不同存储后端和部署设置的复杂性极大地阻碍了我们分析和正确预测I/O性能的能力。为此，本研究提出了一种端到端建模技术，用于预测在云基础设施上运行的I/O密集型大数据应用程序的性能。我们开发了一个针对应用程序和基础设施维度进行调整的模型:基本I/O操作、数据访问模式、存储后端和部署参数。经过训练的模型既可用于预测I/O，也可用于预测一般任务性能。我们的评估结果表明，对于I/O操作占主导地位的作业，例如I/O绑定的MapReduce作业，我们的模型能够以接近90%的准确率预测执行时间，随着应用程序处理变得更复杂而降低。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE International Conference on Cloud Engineering

自引率

0.00%

发文量