优化集群和生产网格上的作业超时

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07) Pub Date : 2007-05-14 DOI:10.1109/CCGRID.2007.78

T. Glatard, X. Pennec

{"title":"优化集群和生产网格上的作业超时","authors":"T. Glatard, X. Pennec","doi":"10.1109/CCGRID.2007.78","DOIUrl":null,"url":null,"abstract":"This paper presents a method to optimize the timeout value of computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into account a proportion of outliers to model either reliable clusters or production grids characterized by faults causing jobs loss. Job management systems are first studied considering classical distributions. Different behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure1. Those results show that using the optimal timeout value provided by our method reduces the impact of outliers and leads to a 1.36 speed-up even for reliable systems without outliers.","PeriodicalId":278535,"journal":{"name":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":"{\"title\":\"Optimizing jobs timeouts on clusters and production grids\",\"authors\":\"T. Glatard, X. Pennec\",\"doi\":\"10.1109/CCGRID.2007.78\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a method to optimize the timeout value of computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into account a proportion of outliers to model either reliable clusters or production grids characterized by faults causing jobs loss. Job management systems are first studied considering classical distributions. Different behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure1. Those results show that using the optimal timeout value provided by our method reduces the impact of outliers and leads to a 1.36 speed-up even for reliable systems without outliers.\",\"PeriodicalId\":278535,\"journal\":{\"name\":\"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-05-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"28\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2007.78\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2007.78","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

摘要

提出了一种优化计算作业超时值的方法。它依赖于一个作业执行时间模型，该模型通过一个随机变量考虑了作业管理系统延迟。它还考虑了一定比例的异常值来模拟可靠集群或以故障导致失业为特征的生产网格。首先研究了经典分布下的作业管理系统。根据分布尾部的权重和异常值的数量，表现出不同的行为。实验结果基于在EGEE网格基础设施上测量的延迟分布和异常值比率1。这些结果表明，使用我们的方法提供的最佳超时值可以减少异常值的影响，即使对于没有异常值的可靠系统，也可以获得1.36的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing jobs timeouts on clusters and production grids

This paper presents a method to optimize the timeout value of computing jobs. It relies on a model of the job execution time that considers the job management system latency through a random variable. It also takes into account a proportion of outliers to model either reliable clusters or production grids characterized by faults causing jobs loss. Job management systems are first studied considering classical distributions. Different behaviors are exhibited, depending on the weight of the tail of the distribution and on the amount of outliers. Experimental results are then shown based on the latency distribution and outlier ratios measured on the EGEE grid infrastructure1. Those results show that using the optimal timeout value provided by our method reduces the impact of outliers and leads to a 1.36 speed-up even for reliable systems without outliers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07)

自引率

0.00%

发文量