为基于hadoop的应用提供和评估多域网络云

2011 IEEE Third International Conference on Cloud Computing Technology and Science Pub Date : 2011-11-29 DOI:10.1109/CloudCom.2011.107

A. Mandal, Yufeng Xin, I. Baldine, P. Ruth, Chris Heermann, J. Chase, Victor Orlikowski, Aydan R. Yumerefendi

{"title":"为基于hadoop的应用提供和评估多域网络云","authors":"A. Mandal, Yufeng Xin, I. Baldine, P. Ruth, Chris Heermann, J. Chase, Victor Orlikowski, Aydan R. Yumerefendi","doi":"10.1109/CloudCom.2011.107","DOIUrl":null,"url":null,"abstract":"This paper presents the design, implementation, and evaluation of a new system for on-demand provisioning of Hadoop clusters across multiple cloud domains. The Hadoop clusters are created \"on-demand\" and are composed of virtual machines from multiple cloud sites linked with bandwidth-provisioned network pipes. The prototype uses an existing federated cloud control framework called Open Resource Control Architecture (ORCA), which orchestrates the leasing and configuration of virtual infrastructure from multiple autonomous cloud sites and network providers. ORCA enables computational and network resources from multiple clouds and network substrates to be aggregated into a single virtual \"slice\" of resources, built to order for the needs of the application. The experiments examine various provisioning alternatives by evaluating the performance of representative Hadoop benchmarks and applications on resource topologies with varying bandwidths. The evaluations examine conditions in which multi-cloud Hadoop deployments pose significant advantages or disadvantages during Map/Reduce/Shuffle operations. Further, the experiments compare multi-cloud Hadoop deployments with single-cloud deployments and investigate Hadoop Distributed File System (HDFS) performance under varying network configurations. The results show that networked clouds make cross-cloud Hadoop deployment feasible with high bandwidth network links between clouds. As expected, performance for some benchmarks degrades rapidly with constrained inter-cloud bandwidth. MapReduce shuffle patterns and certain Hadoop Distributed File System (HDFS) operations that span the constrained links are particularly sensitive to network performance. Hadoop's topology-awareness feature can mitigate these penalties to a modest degree in these hybrid bandwidth scenarios. Additional observations show that contention among co-located virtual machines is a source of irregular performance for Hadoop applications on virtual cloud infrastructure.","PeriodicalId":427190,"journal":{"name":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Provisioning and Evaluating Multi-domain Networked Clouds for Hadoop-based Applications\",\"authors\":\"A. Mandal, Yufeng Xin, I. Baldine, P. Ruth, Chris Heermann, J. Chase, Victor Orlikowski, Aydan R. Yumerefendi\",\"doi\":\"10.1109/CloudCom.2011.107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the design, implementation, and evaluation of a new system for on-demand provisioning of Hadoop clusters across multiple cloud domains. The Hadoop clusters are created \\\"on-demand\\\" and are composed of virtual machines from multiple cloud sites linked with bandwidth-provisioned network pipes. The prototype uses an existing federated cloud control framework called Open Resource Control Architecture (ORCA), which orchestrates the leasing and configuration of virtual infrastructure from multiple autonomous cloud sites and network providers. ORCA enables computational and network resources from multiple clouds and network substrates to be aggregated into a single virtual \\\"slice\\\" of resources, built to order for the needs of the application. The experiments examine various provisioning alternatives by evaluating the performance of representative Hadoop benchmarks and applications on resource topologies with varying bandwidths. The evaluations examine conditions in which multi-cloud Hadoop deployments pose significant advantages or disadvantages during Map/Reduce/Shuffle operations. Further, the experiments compare multi-cloud Hadoop deployments with single-cloud deployments and investigate Hadoop Distributed File System (HDFS) performance under varying network configurations. The results show that networked clouds make cross-cloud Hadoop deployment feasible with high bandwidth network links between clouds. As expected, performance for some benchmarks degrades rapidly with constrained inter-cloud bandwidth. MapReduce shuffle patterns and certain Hadoop Distributed File System (HDFS) operations that span the constrained links are particularly sensitive to network performance. Hadoop's topology-awareness feature can mitigate these penalties to a modest degree in these hybrid bandwidth scenarios. Additional observations show that contention among co-located virtual machines is a source of irregular performance for Hadoop applications on virtual cloud infrastructure.\",\"PeriodicalId\":427190,\"journal\":{\"name\":\"2011 IEEE Third International Conference on Cloud Computing Technology and Science\",\"volume\":\"38 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE Third International Conference on Cloud Computing Technology and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CloudCom.2011.107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Third International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2011.107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

本文介绍了一个用于跨多个云域按需提供Hadoop集群的新系统的设计、实现和评估。Hadoop集群是“按需”创建的，由来自多个云站点的虚拟机组成，这些虚拟机与带宽预置的网络管道相连。该原型使用了一个名为开放资源控制架构(ORCA)的现有联合云控制框架，该框架协调了来自多个自治云站点和网络提供商的虚拟基础设施的租赁和配置。ORCA使来自多个云和网络基础的计算和网络资源能够聚合到一个虚拟的资源“片”中，并根据应用程序的需求进行构建。实验通过评估具有不同带宽的代表性Hadoop基准和应用程序在资源拓扑上的性能来检查各种供应备选方案。评估检查了多云Hadoop部署在Map/Reduce/Shuffle操作期间产生显著优势或劣势的条件。此外，实验比较了多云Hadoop部署和单云部署，并研究了Hadoop分布式文件系统(HDFS)在不同网络配置下的性能。结果表明，通过云之间的高带宽网络链路，网络化的云使得跨云部署Hadoop成为可能。正如预期的那样，由于云间带宽受限，某些基准测试的性能会迅速下降。MapReduce shuffle模式和某些跨越约束链路的HDFS (Hadoop Distributed File System)操作对网络性能特别敏感。在这些混合带宽场景中，Hadoop的拓扑感知特性可以在一定程度上减轻这些损失。另外的观察表明，共存的虚拟机之间的争用是虚拟云基础设施上Hadoop应用程序不正常性能的一个来源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Provisioning and Evaluating Multi-domain Networked Clouds for Hadoop-based Applications

This paper presents the design, implementation, and evaluation of a new system for on-demand provisioning of Hadoop clusters across multiple cloud domains. The Hadoop clusters are created "on-demand" and are composed of virtual machines from multiple cloud sites linked with bandwidth-provisioned network pipes. The prototype uses an existing federated cloud control framework called Open Resource Control Architecture (ORCA), which orchestrates the leasing and configuration of virtual infrastructure from multiple autonomous cloud sites and network providers. ORCA enables computational and network resources from multiple clouds and network substrates to be aggregated into a single virtual "slice" of resources, built to order for the needs of the application. The experiments examine various provisioning alternatives by evaluating the performance of representative Hadoop benchmarks and applications on resource topologies with varying bandwidths. The evaluations examine conditions in which multi-cloud Hadoop deployments pose significant advantages or disadvantages during Map/Reduce/Shuffle operations. Further, the experiments compare multi-cloud Hadoop deployments with single-cloud deployments and investigate Hadoop Distributed File System (HDFS) performance under varying network configurations. The results show that networked clouds make cross-cloud Hadoop deployment feasible with high bandwidth network links between clouds. As expected, performance for some benchmarks degrades rapidly with constrained inter-cloud bandwidth. MapReduce shuffle patterns and certain Hadoop Distributed File System (HDFS) operations that span the constrained links are particularly sensitive to network performance. Hadoop's topology-awareness feature can mitigate these penalties to a modest degree in these hybrid bandwidth scenarios. Additional observations show that contention among co-located virtual machines is a source of irregular performance for Hadoop applications on virtual cloud infrastructure.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE Third International Conference on Cloud Computing Technology and Science

自引率

0.00%

发文量