Exploring the Challenges and Opportunities of Cloud Stacks in Dynamic Resource Environments

2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC) Pub Date : 2017-10-01 DOI:10.1109/CIC.2017.00061

Fan Yang, Haryadi S. Gunawi, A. Chien

{"title":"Exploring the Challenges and Opportunities of Cloud Stacks in Dynamic Resource Environments","authors":"Fan Yang, Haryadi S. Gunawi, A. Chien","doi":"10.1109/CIC.2017.00061","DOIUrl":null,"url":null,"abstract":"Traditional cloud stacks are designed to tolerate server or rack-level failures, that are unpredictable and uncorrelated. � Such stacks successfully deliver highly-available cloud services at global scale. The increasing criticality of cloud services to the overall world economy is causing concern about the impact of power outages, cyber-attacks, configuration errors, or other causes of datacenter or larger-scale failures on cloud availability. Recent experience shows that these events can trigger cascading failures and global-scale service outages. We study the impact of correlated, datacenter resource failures, exploring distributed protocols (widely-used in Cassandra) across varied configurations and resource availability. Our study reveals that using such protocols to achieve high availability on resources with large-scale, correlated outages are costly in storage and update traffic, requiring replication factors of 10 or more. Further analysis reveals that this limitation arises from from inflexible replication and quorum.","PeriodicalId":156843,"journal":{"name":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIC.2017.00061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional cloud stacks are designed to tolerate server or rack-level failures, that are unpredictable and uncorrelated. � Such stacks successfully deliver highly-available cloud services at global scale. The increasing criticality of cloud services to the overall world economy is causing concern about the impact of power outages, cyber-attacks, configuration errors, or other causes of datacenter or larger-scale failures on cloud availability. Recent experience shows that these events can trigger cascading failures and global-scale service outages. We study the impact of correlated, datacenter resource failures, exploring distributed protocols (widely-used in Cassandra) across varied configurations and resource availability. Our study reveals that using such protocols to achieve high availability on resources with large-scale, correlated outages are costly in storage and update traffic, requiring replication factors of 10 or more. Further analysis reveals that this limitation arises from from inflexible replication and quorum.

查看原文本刊更多论文

探索动态资源环境中云堆栈的挑战和机遇

传统的云堆栈被设计为能够容忍服务器或机架级的故障，这些故障是不可预测的和不相关的。这些堆栈成功地在全球范围内提供了高可用性的云服务。云服务对整个世界经济的重要性日益增加，这引起了人们对停电、网络攻击、配置错误或其他导致数据中心或更大规模故障的原因对云可用性的影响的担忧。最近的经验表明，这些事件可能引发级联故障和全球规模的服务中断。我们研究了相关的数据中心资源故障的影响，探索了不同配置和资源可用性的分布式协议(在Cassandra中广泛使用)。我们的研究表明，使用此类协议在具有大规模相关中断的资源上实现高可用性在存储和更新流量方面是昂贵的，需要10或更多的复制因子。进一步的分析表明，这种限制源于不灵活的复制和仲裁。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 3rd International Conference on Collaboration and Internet Computing (CIC)

自引率

0.00%

发文量