Efficient response time predictions by exploiting application and resource state similarities

The 6th IEEE/ACM International Workshop on Grid Computing, 2005. Pub Date : 2005-11-13 DOI:10.1109/GRID.2005.1542747

Hui Li, D. Groep, L. Wolters

{"title":"Efficient response time predictions by exploiting application and resource state similarities","authors":"Hui Li, D. Groep, L. Wolters","doi":"10.1109/GRID.2005.1542747","DOIUrl":null,"url":null,"abstract":"In large-scale grids with many possible resources (clusters of computing elements) to run applications, it is useful that the resources can provide predictions of job response times so users or resource brokers can make better scheduling decisions. Two metrics need to be estimated for response time predictions: one is how long a job executes on the resource (application run time), the other is how long the job waits in the queue before starting (queue wait time). In this paper we propose an instance based learning technique to predict these two metrics by mining historical workloads. The novelty of our approach is to introduce policy attributes in representing and comparing resource states, which is defined as the pool of running and queued jobs on the resource at the time to make a prediction. The policy attributes reflect the local resource scheduling policies and they can be automatically discovered using a genetic search algorithm. The main advantages of this approach compared with scheduler simulation are two-folds: Firstly, it has a better performance to meet the real time requirement of Grid resource brokering; secondly, it is more general because the scheduling policies are learned from past observations. Our experimental results on the NIKHEF LCG production cluster show that acceptable prediction accuracy can be obtained, where the relative prediction errors for response times are between 0.35 and 0.70.","PeriodicalId":347929,"journal":{"name":"The 6th IEEE/ACM International Workshop on Grid Computing, 2005.","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 6th IEEE/ACM International Workshop on Grid Computing, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRID.2005.1542747","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 41

Abstract

In large-scale grids with many possible resources (clusters of computing elements) to run applications, it is useful that the resources can provide predictions of job response times so users or resource brokers can make better scheduling decisions. Two metrics need to be estimated for response time predictions: one is how long a job executes on the resource (application run time), the other is how long the job waits in the queue before starting (queue wait time). In this paper we propose an instance based learning technique to predict these two metrics by mining historical workloads. The novelty of our approach is to introduce policy attributes in representing and comparing resource states, which is defined as the pool of running and queued jobs on the resource at the time to make a prediction. The policy attributes reflect the local resource scheduling policies and they can be automatically discovered using a genetic search algorithm. The main advantages of this approach compared with scheduler simulation are two-folds: Firstly, it has a better performance to meet the real time requirement of Grid resource brokering; secondly, it is more general because the scheduling policies are learned from past observations. Our experimental results on the NIKHEF LCG production cluster show that acceptable prediction accuracy can be obtained, where the relative prediction errors for response times are between 0.35 and 0.70.

查看原文本刊更多论文

通过利用应用程序和资源状态的相似性进行有效的响应时间预测

在具有许多可能的资源(计算元素集群)来运行应用程序的大规模网格中，资源可以提供作业响应时间的预测，以便用户或资源代理可以做出更好的调度决策。预测响应时间需要估计两个指标:一个是作业在资源上执行的时间(应用程序运行时)，另一个是作业在启动前在队列中等待的时间(队列等待时间)。本文提出了一种基于实例的学习技术，通过挖掘历史工作负载来预测这两个指标。我们方法的新颖之处在于在表示和比较资源状态时引入了策略属性，它被定义为在进行预测时资源上运行和排队作业的池。策略属性反映了本地资源调度策略，可以通过遗传搜索算法自动发现。与调度程序仿真相比，该方法具有两方面的优势:首先，它具有更好的性能，能够满足网格资源代理的实时性要求;其次，它更通用，因为调度策略是从过去的观察中学习的。我们在NIKHEF LCG生产集群上的实验结果表明，我们可以获得可接受的预测精度，其中响应时间的相对预测误差在0.35 ~ 0.70之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 6th IEEE/ACM International Workshop on Grid Computing, 2005.

自引率

0.00%

发文量