Speculative Scheduling for Stochastic HPC Applications

Proceedings of the 48th International Conference on Parallel Processing Pub Date : 2019-08-05 DOI:10.1145/3337821.3337890

Ana Gainaru, Guillaume Pallez, Hongyang Sun, P. Raghavan

{"title":"Speculative Scheduling for Stochastic HPC Applications","authors":"Ana Gainaru, Guillaume Pallez, Hongyang Sun, P. Raghavan","doi":"10.1145/3337821.3337890","DOIUrl":null,"url":null,"abstract":"New emerging fields are developing a growing number of large-scale applications with heterogeneous, dynamic and data-intensive requirements that put a high emphasis on productivity and thus are not tuned to run efficiently on today's high performance computing (HPC) systems. Some of these applications, such as neuroscience workloads and those that use adaptive numerical algorithms, develop modeling and simulation workflows with stochastic execution times and unpredictable resource requirements. When they are deployed on current HPC systems using existing resource management solutions, it can result in loss of efficiency for the users and decrease in effective system utilization for the platform providers. In this paper, we consider the current HPC scheduling model and describe the challenge it poses for stochastic applications due to the strict requirement in its job deployment policies. To address the challenge, we present speculative scheduling techniques that adapt the resource requirements of a stochastic application on-the-fly, based on its past execution behavior instead of relying on estimates given by the user. We focus on improving the overall system utilization and application response time without disrupting the current HPC scheduling model or the application development process. Our solution can operate alongside existing HPC batch schedulers without interfering with their usage modes. We show that speculative scheduling can improve the system utilization and average application response time by 25-30% compared to the classical HPC approach.","PeriodicalId":405273,"journal":{"name":"Proceedings of the 48th International Conference on Parallel Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 48th International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3337821.3337890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

New emerging fields are developing a growing number of large-scale applications with heterogeneous, dynamic and data-intensive requirements that put a high emphasis on productivity and thus are not tuned to run efficiently on today's high performance computing (HPC) systems. Some of these applications, such as neuroscience workloads and those that use adaptive numerical algorithms, develop modeling and simulation workflows with stochastic execution times and unpredictable resource requirements. When they are deployed on current HPC systems using existing resource management solutions, it can result in loss of efficiency for the users and decrease in effective system utilization for the platform providers. In this paper, we consider the current HPC scheduling model and describe the challenge it poses for stochastic applications due to the strict requirement in its job deployment policies. To address the challenge, we present speculative scheduling techniques that adapt the resource requirements of a stochastic application on-the-fly, based on its past execution behavior instead of relying on estimates given by the user. We focus on improving the overall system utilization and application response time without disrupting the current HPC scheduling model or the application development process. Our solution can operate alongside existing HPC batch schedulers without interfering with their usage modes. We show that speculative scheduling can improve the system utilization and average application response time by 25-30% compared to the classical HPC approach.

查看原文本刊更多论文

随机高性能计算应用的推测调度

新兴领域正在开发越来越多的具有异构、动态和数据密集型需求的大规模应用程序，这些应用程序高度强调生产力，因此无法在当今的高性能计算(HPC)系统上高效运行。其中一些应用，如神经科学工作负载和那些使用自适应数值算法的应用，开发具有随机执行时间和不可预测资源需求的建模和仿真工作流程。当它们使用现有的资源管理解决方案部署在当前的HPC系统上时，可能会导致用户效率的降低，并降低平台提供商的有效系统利用率。本文考虑了当前的高性能计算调度模型，并描述了由于其作业部署策略要求严格而给随机应用带来的挑战。为了应对这一挑战，我们提出了推测调度技术，该技术基于随机应用程序过去的执行行为，而不是依赖于用户给出的估计，实时适应其资源需求。我们专注于在不中断当前HPC调度模型或应用程序开发过程的情况下提高整体系统利用率和应用程序响应时间。我们的解决方案可以与现有的HPC批调度程序一起运行，而不会干扰它们的使用模式。我们表明，与传统的高性能计算方法相比，推测调度可以将系统利用率和平均应用程序响应时间提高25-30%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 48th International Conference on Parallel Processing

自引率

0.00%

发文量