E-HPC: a library for elastic resource management in HPC environments

William Fox, D. Ghoshal, Abel Souza, G. Rodrigo, L. Ramakrishnan
{"title":"E-HPC: a library for elastic resource management in HPC environments","authors":"William Fox, D. Ghoshal, Abel Souza, G. Rodrigo, L. Ramakrishnan","doi":"10.1145/3150994.3150996","DOIUrl":null,"url":null,"abstract":"Next-generation data-intensive scientific workflows need to support streaming and real-time applications with dynamic resource needs on high performance computing (HPC) platforms. The static resource allocation model on current HPC systems that was designed for monolithic MPI applications is insufficient to support the elastic resource needs of current and future workflows. In this paper, we discuss the design, implementation and evaluation of Elastic-HPC (E-HPC), an elastic framework for managing resources for scientific workflows on current HPC systems. E-HPC considers a resource slot for a workflow as an elastic window that might map to different physical resources over the duration of a workflow. Our framework uses checkpoint-restart as the underlying mechanism to migrate workflow execution across the dynamic window of resources. E-HPC provides the foundation necessary to enable dynamic resource allocation of HPC resources that are needed for streaming and real-time workflows. E-HPC has negligible overhead beyond the cost of checkpointing. Additionally, E-HPC results in decreased turnaround time of workflows compared to traditional model of resource allocation for workflows, where resources are allocated per stage of the workflow. Our evaluation shows that E-HPC improves core hour utilization for common workflow resource use patterns and provides an effective framework for elastic expansion of resources for applications with dynamic resource needs.","PeriodicalId":228111,"journal":{"name":"Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th Workshop on Workflows in Support of Large-Scale Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3150994.3150996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Next-generation data-intensive scientific workflows need to support streaming and real-time applications with dynamic resource needs on high performance computing (HPC) platforms. The static resource allocation model on current HPC systems that was designed for monolithic MPI applications is insufficient to support the elastic resource needs of current and future workflows. In this paper, we discuss the design, implementation and evaluation of Elastic-HPC (E-HPC), an elastic framework for managing resources for scientific workflows on current HPC systems. E-HPC considers a resource slot for a workflow as an elastic window that might map to different physical resources over the duration of a workflow. Our framework uses checkpoint-restart as the underlying mechanism to migrate workflow execution across the dynamic window of resources. E-HPC provides the foundation necessary to enable dynamic resource allocation of HPC resources that are needed for streaming and real-time workflows. E-HPC has negligible overhead beyond the cost of checkpointing. Additionally, E-HPC results in decreased turnaround time of workflows compared to traditional model of resource allocation for workflows, where resources are allocated per stage of the workflow. Our evaluation shows that E-HPC improves core hour utilization for common workflow resource use patterns and provides an effective framework for elastic expansion of resources for applications with dynamic resource needs.
E-HPC:一个用于HPC环境中弹性资源管理的库
下一代数据密集型科学工作流需要在高性能计算(HPC)平台上支持具有动态资源需求的流和实时应用。当前HPC系统上的静态资源分配模型是为单片MPI应用设计的,不足以支持当前和未来工作流程的弹性资源需求。在本文中,我们讨论了弹性HPC (E-HPC)的设计,实现和评估,这是一个弹性框架,用于管理现有HPC系统上的科学工作流资源。E-HPC认为工作流的资源槽是一个弹性窗口,可以在工作流期间映射到不同的物理资源。我们的框架使用检查点重新启动作为跨动态资源窗口迁移工作流执行的底层机制。E-HPC为实现流和实时工作流所需的HPC资源的动态资源分配提供了必要的基础。除了检查点的成本之外,E-HPC的开销可以忽略不计。此外,与传统的工作流资源分配模型相比,E-HPC减少了工作流的周转时间,传统的工作流资源分配模型是在工作流的每个阶段分配资源的。我们的评估表明,E-HPC提高了常见工作流资源使用模式的核心小时利用率,并为具有动态资源需求的应用程序提供了一个有效的资源弹性扩展框架。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信