{"title":"Banking on decoupling: budget-driven sustainability for HPC applications on auction-based clouds","authors":"Moussa Taifi","doi":"10.1145/2506164.2506172","DOIUrl":null,"url":null,"abstract":"Cloud providers are auctioning their excess capacity using dynamically priced virtual instances. These spot instances provide significant savings compared to on-demand or fixed price instances. The users willing to use these resources are asked to provide a maximum bid price per hour, and the cloud provider runs the instances as long as the market price is below the user's bid price. By using such resources, the users are exposed explicitly to failures, and need to adapt their applications to provide some level of fault tolerance. In this paper, we expose the effect of bidding in the case of virtual HPC clusters composed of spot instances. We describe the interesting effect of uniform versus non-uniform bidding in terms of both the failure rate and the failure model. We propose an initial attempt to deal with the problem of predicting the runtime of a parallel application under various bidding strategies and various system parameters. We describe the relationship between bidding strategies and programming models, and we build a preliminary optimization model that uses real price traces from Amazon Web Services as inputs, as well as instrumented values related to the processing and network capacities of cluster instances on the EC2 services. Our results show preliminary insights into the relationship between non-uniform bidding and application scaling strategies.","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGOPS Oper. Syst. Rev.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2506164.2506172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Cloud providers are auctioning their excess capacity using dynamically priced virtual instances. These spot instances provide significant savings compared to on-demand or fixed price instances. The users willing to use these resources are asked to provide a maximum bid price per hour, and the cloud provider runs the instances as long as the market price is below the user's bid price. By using such resources, the users are exposed explicitly to failures, and need to adapt their applications to provide some level of fault tolerance. In this paper, we expose the effect of bidding in the case of virtual HPC clusters composed of spot instances. We describe the interesting effect of uniform versus non-uniform bidding in terms of both the failure rate and the failure model. We propose an initial attempt to deal with the problem of predicting the runtime of a parallel application under various bidding strategies and various system parameters. We describe the relationship between bidding strategies and programming models, and we build a preliminary optimization model that uses real price traces from Amazon Web Services as inputs, as well as instrumented values related to the processing and network capacities of cluster instances on the EC2 services. Our results show preliminary insights into the relationship between non-uniform bidding and application scaling strategies.
云提供商正在使用动态定价的虚拟实例拍卖他们的过剩容量。与按需或固定价格实例相比,这些现货实例提供了显著的节省。愿意使用这些资源的用户被要求提供每小时的最高出价,只要市场价格低于用户的出价,云提供商就会运行这些实例。通过使用这些资源,用户将显式地暴露于故障,并且需要调整他们的应用程序以提供某种程度的容错。在本文中,我们揭示了竞价在由现货实例组成的虚拟高性能计算集群中的效果。我们从失败率和失效模型两个方面描述了均匀投标和非均匀投标的有趣效果。我们提出了一个初步的尝试来处理在各种投标策略和各种系统参数下预测并行应用程序运行时的问题。我们描述了投标策略和编程模型之间的关系,并建立了一个初步的优化模型,该模型使用来自Amazon Web Services的真实价格轨迹作为输入,以及与EC2服务上集群实例的处理和网络容量相关的仪器值。我们的研究结果初步揭示了非统一竞价与应用程序扩展策略之间的关系。