Feras Al-Hawari, Khaled Tayem, S. Alouneh, Anass Al-Ksasbeh
{"title":"基于SAN存储的Hyper-V集群Hadoop MapReduce性能评估方法","authors":"Feras Al-Hawari, Khaled Tayem, S. Alouneh, Anass Al-Ksasbeh","doi":"10.1109/ACIT57182.2022.9994200","DOIUrl":null,"url":null,"abstract":"Deploying Hadoop MapReduce applications in a virtualized environment is adopted by some cloud computing providers for better resource utilization. However, the virtualization overhead can negatively affect the performance of applications when executed on virtual machines rather than physical servers. In that regard, this paper introduces a methodology to match the software and hardware specifications of virtual and physical Hadoop clusters to allow accurate measurement of virtualization overhead as well as enable the efficient execution of MapReduce applications on both clusters. The methodology considers configuring non uniform memory access in the utilized powerful servers. It also factors in multipath aggregation that facilitates load balancing and failover protection when the servers access SAN storage over a local Ethernet network. The WordCount application with workloads that reached 750 GB was executed on both clusters to evaluate the effects of virtualization on application performance. The results showed that the average elapsed time of a MapReduce application on a specific virtual cluster can be 19.2% higher than that on a physical cluster, mainly due to I/O throughput degradation over the virtual cluster in that case.","PeriodicalId":256713,"journal":{"name":"2022 International Arab Conference on Information Technology (ACIT)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Methodology to Evaluate the Performance of Hadoop MapReduce on a Hyper-V Cluster using SAN Storage\",\"authors\":\"Feras Al-Hawari, Khaled Tayem, S. Alouneh, Anass Al-Ksasbeh\",\"doi\":\"10.1109/ACIT57182.2022.9994200\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deploying Hadoop MapReduce applications in a virtualized environment is adopted by some cloud computing providers for better resource utilization. However, the virtualization overhead can negatively affect the performance of applications when executed on virtual machines rather than physical servers. In that regard, this paper introduces a methodology to match the software and hardware specifications of virtual and physical Hadoop clusters to allow accurate measurement of virtualization overhead as well as enable the efficient execution of MapReduce applications on both clusters. The methodology considers configuring non uniform memory access in the utilized powerful servers. It also factors in multipath aggregation that facilitates load balancing and failover protection when the servers access SAN storage over a local Ethernet network. The WordCount application with workloads that reached 750 GB was executed on both clusters to evaluate the effects of virtualization on application performance. The results showed that the average elapsed time of a MapReduce application on a specific virtual cluster can be 19.2% higher than that on a physical cluster, mainly due to I/O throughput degradation over the virtual cluster in that case.\",\"PeriodicalId\":256713,\"journal\":{\"name\":\"2022 International Arab Conference on Information Technology (ACIT)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Arab Conference on Information Technology (ACIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ACIT57182.2022.9994200\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIT57182.2022.9994200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Methodology to Evaluate the Performance of Hadoop MapReduce on a Hyper-V Cluster using SAN Storage
Deploying Hadoop MapReduce applications in a virtualized environment is adopted by some cloud computing providers for better resource utilization. However, the virtualization overhead can negatively affect the performance of applications when executed on virtual machines rather than physical servers. In that regard, this paper introduces a methodology to match the software and hardware specifications of virtual and physical Hadoop clusters to allow accurate measurement of virtualization overhead as well as enable the efficient execution of MapReduce applications on both clusters. The methodology considers configuring non uniform memory access in the utilized powerful servers. It also factors in multipath aggregation that facilitates load balancing and failover protection when the servers access SAN storage over a local Ethernet network. The WordCount application with workloads that reached 750 GB was executed on both clusters to evaluate the effects of virtualization on application performance. The results showed that the average elapsed time of a MapReduce application on a specific virtual cluster can be 19.2% higher than that on a physical cluster, mainly due to I/O throughput degradation over the virtual cluster in that case.