Service Clustering for Autonomic Clouds Using Random Forest

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2015-05-04 DOI:10.1109/CCGrid.2015.41

Rafael Brundo Uriarte, S. Tsaftaris, F. Tiezzi

{"title":"Service Clustering for Autonomic Clouds Using Random Forest","authors":"Rafael Brundo Uriarte, S. Tsaftaris, F. Tiezzi","doi":"10.1109/CCGrid.2015.41","DOIUrl":null,"url":null,"abstract":"Managing and optimising cloud services is one of the main challenges faced by industry and academia. A possible solution is resorting to self-management, as fostered by autonomic computing. However, the abstraction layer provided by cloud computing obfuscates several details of the provided services, which, in turn, hinders the effectiveness of autonomic managers. Data-driven approaches, particularly those relying on service clustering based on machine learning techniques, can assist the autonomic management and support decisions concerning, for example, the scheduling and deployment of services. One aspect that complicates this approach is that the information provided by the monitoring contains both continuous (e.g. CPU load) and categorical (e.g. VM instance type) data. Current approaches treat this problem in a heuristic fashion. This paper, instead, proposes an approach, which uses all kinds of data and learns in a data-driven fashion the similarities and resource usage patterns among the services. In particular, we use an unsupervised formulation of the Random Forest algorithm to calculate similarities and provide them as input to a clustering algorithm. For the sake of efficiency and meeting the dynamism requirement of autonomic clouds, our methodology consists of two steps: (i) off-line clustering and (ii) on-line prediction. Using datasets from real-world clouds, we demonstrate the superiority of our solution with respect to others and validate the accuracy of the on-line prediction. Moreover, to show the applicability of our approach, we devise a service scheduler that uses the notion of similarity among services and evaluate it in a cloud test-bed.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"13 1","pages":"515-524"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Managing and optimising cloud services is one of the main challenges faced by industry and academia. A possible solution is resorting to self-management, as fostered by autonomic computing. However, the abstraction layer provided by cloud computing obfuscates several details of the provided services, which, in turn, hinders the effectiveness of autonomic managers. Data-driven approaches, particularly those relying on service clustering based on machine learning techniques, can assist the autonomic management and support decisions concerning, for example, the scheduling and deployment of services. One aspect that complicates this approach is that the information provided by the monitoring contains both continuous (e.g. CPU load) and categorical (e.g. VM instance type) data. Current approaches treat this problem in a heuristic fashion. This paper, instead, proposes an approach, which uses all kinds of data and learns in a data-driven fashion the similarities and resource usage patterns among the services. In particular, we use an unsupervised formulation of the Random Forest algorithm to calculate similarities and provide them as input to a clustering algorithm. For the sake of efficiency and meeting the dynamism requirement of autonomic clouds, our methodology consists of two steps: (i) off-line clustering and (ii) on-line prediction. Using datasets from real-world clouds, we demonstrate the superiority of our solution with respect to others and validate the accuracy of the on-line prediction. Moreover, to show the applicability of our approach, we devise a service scheduler that uses the notion of similarity among services and evaluate it in a cloud test-bed.

查看原文本刊更多论文

基于随机森林的自主云服务聚类

管理和优化云服务是工业界和学术界面临的主要挑战之一。一个可能的解决方案是诉诸于自主计算所促进的自我管理。然而，云计算提供的抽象层混淆了所提供服务的几个细节，这反过来又阻碍了自治管理器的有效性。数据驱动的方法，特别是那些依赖于基于机器学习技术的服务集群的方法，可以帮助自主管理和支持决策，例如，服务的调度和部署。使这种方法复杂化的一个方面是，监视提供的信息既包含连续数据(例如CPU负载)，也包含分类数据(例如VM实例类型)。目前的方法以启发式的方式处理这个问题。相反，本文提出了一种方法，该方法使用各种数据，并以数据驱动的方式学习服务之间的相似性和资源使用模式。特别是，我们使用随机森林算法的无监督公式来计算相似度，并将其作为聚类算法的输入。为了提高效率和满足自主云的动态要求，我们的方法包括两个步骤:(i)离线聚类和(ii)在线预测。使用来自真实世界云的数据集，我们证明了我们的解决方案相对于其他解决方案的优越性，并验证了在线预测的准确性。此外，为了展示我们方法的适用性，我们设计了一个服务调度器，它使用服务之间的相似性概念，并在云测试平台中对其进行评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量