Efficient and robust allocation algorithms in clouds under memory constraints

2014 21st International Conference on High Performance Computing (HiPC) Pub Date : 2013-10-19 DOI:10.1109/HiPC.2014.7116894

Olivier Beaumont, J. Lorenzo, Lionel Eyraud-Dubois, Paul Renaud-Goud

{"title":"Efficient and robust allocation algorithms in clouds under memory constraints","authors":"Olivier Beaumont, J. Lorenzo, Lionel Eyraud-Dubois, Paul Renaud-Goud","doi":"10.1109/HiPC.2014.7116894","DOIUrl":null,"url":null,"abstract":"We consider robust resource allocation of services in Clouds. More specifically, we consider the case of a large public or private Cloud platform such that a relatively small set of large and independent services accounts for most of the overall CPU usage of the platform. We will show, using a recent trace from Google, that this assumption is very reasonable in practice. The objective is to provide an allocation of the services onto the machines of the platform, using replication in order to be resilient to machine failures. The services are characterized by their demand along several dimensions (CPU, memory,...) and by their quality of service requirements, that have been defined through an SLA in the case of a public Cloud or fixed by the administrator in the case of a private Cloud. This quality of service defines the required robustness of the service, by setting an upper limit on the probability that the provider fails to allocate the required quantity of resources. This maximum probability of failure can be transparently turned into a set of (price, penalty) pairs. Our contribution is two-fold. First, we propose a formal model for this allocation problem, and we justify our assumptions based on an analysis of a publicly available cluster usage trace from Google. Second, we propose a resource allocation strategy whose complexity is low in the number of resources, what makes it well suited to large platforms. Finally, we provide an analysis of the proposed strategy through an extensive set of simulations, showing that it can be succesfully applied in the context of the Google trace.","PeriodicalId":337777,"journal":{"name":"2014 21st International Conference on High Performance Computing (HiPC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 21st International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2014.7116894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

We consider robust resource allocation of services in Clouds. More specifically, we consider the case of a large public or private Cloud platform such that a relatively small set of large and independent services accounts for most of the overall CPU usage of the platform. We will show, using a recent trace from Google, that this assumption is very reasonable in practice. The objective is to provide an allocation of the services onto the machines of the platform, using replication in order to be resilient to machine failures. The services are characterized by their demand along several dimensions (CPU, memory,...) and by their quality of service requirements, that have been defined through an SLA in the case of a public Cloud or fixed by the administrator in the case of a private Cloud. This quality of service defines the required robustness of the service, by setting an upper limit on the probability that the provider fails to allocate the required quantity of resources. This maximum probability of failure can be transparently turned into a set of (price, penalty) pairs. Our contribution is two-fold. First, we propose a formal model for this allocation problem, and we justify our assumptions based on an analysis of a publicly available cluster usage trace from Google. Second, we propose a resource allocation strategy whose complexity is low in the number of resources, what makes it well suited to large platforms. Finally, we provide an analysis of the proposed strategy through an extensive set of simulations, showing that it can be succesfully applied in the context of the Google trace.

查看原文本刊更多论文

内存约束下的高效鲁棒云分配算法

我们考虑云中服务的健壮资源分配。更具体地说，我们考虑大型公共或私有云平台的情况，这样一组相对较小的大型独立服务就占了平台总体CPU使用的大部分。我们将使用谷歌最近的跟踪来证明，这个假设在实践中是非常合理的。目标是在平台的机器上提供服务的分配，使用复制以便对机器故障具有弹性。这些服务的特点在于它们在几个维度上的需求(CPU、内存等)和它们的服务质量需求，这些需求在公共云的情况下是通过SLA定义的，在私有云的情况下是由管理员固定的。这种服务质量通过设置提供者未能分配所需资源数量的概率上限来定义服务所需的健壮性。这种最大失败概率可以透明地转化为一组(价格、惩罚)对。我们的贡献是双重的。首先，我们为这个分配问题提出了一个正式的模型，并根据对Google公开可用的集群使用跟踪的分析来证明我们的假设是正确的。其次，我们提出了一种资源分配策略，其复杂性在资源数量上较低，这使得它非常适合大型平台。最后，我们通过一组广泛的模拟对所提出的策略进行了分析，表明它可以成功地应用于Google跟踪的上下文中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 21st International Conference on High Performance Computing (HiPC)

自引率

0.00%

发文量