{"title":"On the impact of virtualization on the I/O performance of analytic workloads","authors":"S. Ha, D. Venzano, P. Brown, P. Michiardi","doi":"10.1109/CLOUDTECH.2016.7847722","DOIUrl":null,"url":null,"abstract":"In this work we study the I/O performance of long, sequential workloads that mimic those of Big Data applications, to understand the implications of system virtualization on data-intensive frameworks such as Apache Hadoop and Spark, which are frequently run in clusters of Virtual Machines (VMs). We do so through an experimental measurement campaign that collects low-level traces and metrics, to show the role played by important parameters such as the I/O schedulers and caching mechanisms involved in the I/O path, and the VM configuration in terms of dedicated resources. Our findings are important, especially for determining appropriate deployment strategies for today's emerging Analytics Services hosted both on public and private clouds.","PeriodicalId":133495,"journal":{"name":"2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd International Conference on Cloud Computing Technologies and Applications (CloudTech)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUDTECH.2016.7847722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
In this work we study the I/O performance of long, sequential workloads that mimic those of Big Data applications, to understand the implications of system virtualization on data-intensive frameworks such as Apache Hadoop and Spark, which are frequently run in clusters of Virtual Machines (VMs). We do so through an experimental measurement campaign that collects low-level traces and metrics, to show the role played by important parameters such as the I/O schedulers and caching mechanisms involved in the I/O path, and the VM configuration in terms of dedicated resources. Our findings are important, especially for determining appropriate deployment strategies for today's emerging Analytics Services hosted both on public and private clouds.