Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings

International Symposium on Component-Based Software Engineering Pub Date : 2013-06-17 DOI:10.1145/2465449.2465467

I. I. Yusuf, H. Schmidt

{"title":"Parameterised architectural patterns for providing cloud service fault tolerance with accurate costings","authors":"I. I. Yusuf, H. Schmidt","doi":"10.1145/2465449.2465467","DOIUrl":null,"url":null,"abstract":"Cloud computing presents a unique opportunity for science and engineering with benefits compared to traditional high-performance computing, especially for smaller compute jobs and entry-level users to parallel computing. However, doubts remain for production high-performance computing in the cloud, the so-called science cloud, as predictable performance, reliability and therefore costs remain elusive for many applications.\n This paper uses parameterised architectural patterns to assist with fault tolerance and cost predictions for science clouds, in which a single job typically holds many virtual machines for a long time, communication can involve massive data movements, and buffered streams allow parallel processing to proceed while data transfers are still incomplete. We utilise predictive models, simulation and actual runs to estimate run times with acceptable accuracy for two of the most common architectural patterns for data-intensive scientific computing: MapReduce and Combinational Logic. Run times are fundamental to understand fee-for-service costs of clouds. These are typically charged by the hour and the number of compute nodes or cores used. We evaluate our models using realistic cloud experiments from collaborative physics research projects and show that proactive and reactive fault tolerance is manageable, predictable and composable, in principle, especially at the architectural level.","PeriodicalId":399536,"journal":{"name":"International Symposium on Component-Based Software Engineering","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Component-Based Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2465449.2465467","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

Abstract

Cloud computing presents a unique opportunity for science and engineering with benefits compared to traditional high-performance computing, especially for smaller compute jobs and entry-level users to parallel computing. However, doubts remain for production high-performance computing in the cloud, the so-called science cloud, as predictable performance, reliability and therefore costs remain elusive for many applications. This paper uses parameterised architectural patterns to assist with fault tolerance and cost predictions for science clouds, in which a single job typically holds many virtual machines for a long time, communication can involve massive data movements, and buffered streams allow parallel processing to proceed while data transfers are still incomplete. We utilise predictive models, simulation and actual runs to estimate run times with acceptable accuracy for two of the most common architectural patterns for data-intensive scientific computing: MapReduce and Combinational Logic. Run times are fundamental to understand fee-for-service costs of clouds. These are typically charged by the hour and the number of compute nodes or cores used. We evaluate our models using realistic cloud experiments from collaborative physics research projects and show that proactive and reactive fault tolerance is manageable, predictable and composable, in principle, especially at the architectural level.

查看原文本刊更多论文

用于提供云服务容错和精确成本的参数化体系结构模式

与传统的高性能计算相比，云计算为科学和工程提供了独特的机会，特别是对于小型计算作业和入门级并行计算用户。然而，对于云中生产的高性能计算，即所谓的科学云，仍然存在疑问，因为许多应用程序的可预测性能、可靠性和成本仍然难以捉摸。本文使用参数化架构模式来协助科学云的容错和成本预测，其中单个作业通常长时间持有许多虚拟机，通信可能涉及大量数据移动，缓冲流允许并行处理进行，而数据传输仍未完成。我们利用预测模型、模拟和实际运行来以可接受的精度估计数据密集型科学计算中两种最常见的架构模式的运行时间:MapReduce和组合逻辑。运行时是理解云的按服务收费成本的基础。这些通常按小时和所使用的计算节点或核心的数量收费。我们使用来自协作物理研究项目的实际云实验来评估我们的模型，并表明主动和被动容错原则上是可管理的、可预测的和可组合的，特别是在架构级别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Symposium on Component-Based Software Engineering

自引率

0.00%

发文量