迈向高性能计算作业调度器的普及容器化

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) Pub Date : 2020-09-01 DOI:10.1109/SBAC-PAD49847.2020.00046

C. Cérin, Nicolas Grenèche, Tarek Menouer

{"title":"迈向高性能计算作业调度器的普及容器化","authors":"C. Cérin, Nicolas Grenèche, Tarek Menouer","doi":"10.1109/SBAC-PAD49847.2020.00046","DOIUrl":null,"url":null,"abstract":"In cloud computing, elasticity is defined as \"the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible\". Adding elasticity to HPC (High Performance Computing) clusters management systems remains challenging even if we deploy such HPC systems in today's cloud environments. This difficulty is caused by the fact that HPC jobs scheduler needs to rely on a fixed set of resources. Every change of topology (adding or removing computing resources) leads to a global restart of the HPC jobs scheduler. This phenomenon is not a major drawback because it provides a very effective way of sharing a fixed set of resources but we think that it could be complemented by a more elastic approach. Moreover, the elasticity issue should not be reduced to the scaling of resources issues. Clouds also enable access to various technologies that enhance the services offer to users. In this paper, our approach is to use containers technology to instantiate a tailored HPC environment based on the user's reservation constraints. We claim that the introduction and use of containers in HPC job schedulers allow better management of resources, in a more economical way. From the use case of SLURM, we release a methodology for 'containerization' of HPC jobs schedulers which is pervasive i.e. spreading widely throughout any layers of job schedulers. We also provide initial experiments demonstrating that our containerized SLURM system is operational and promising.","PeriodicalId":202581,"journal":{"name":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Towards Pervasive Containerization of HPC Job Schedulers\",\"authors\":\"C. Cérin, Nicolas Grenèche, Tarek Menouer\",\"doi\":\"10.1109/SBAC-PAD49847.2020.00046\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In cloud computing, elasticity is defined as \\\"the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible\\\". Adding elasticity to HPC (High Performance Computing) clusters management systems remains challenging even if we deploy such HPC systems in today's cloud environments. This difficulty is caused by the fact that HPC jobs scheduler needs to rely on a fixed set of resources. Every change of topology (adding or removing computing resources) leads to a global restart of the HPC jobs scheduler. This phenomenon is not a major drawback because it provides a very effective way of sharing a fixed set of resources but we think that it could be complemented by a more elastic approach. Moreover, the elasticity issue should not be reduced to the scaling of resources issues. Clouds also enable access to various technologies that enhance the services offer to users. In this paper, our approach is to use containers technology to instantiate a tailored HPC environment based on the user's reservation constraints. We claim that the introduction and use of containers in HPC job schedulers allow better management of resources, in a more economical way. From the use case of SLURM, we release a methodology for 'containerization' of HPC jobs schedulers which is pervasive i.e. spreading widely throughout any layers of job schedulers. We also provide initial experiments demonstrating that our containerized SLURM system is operational and promising.\",\"PeriodicalId\":202581,\"journal\":{\"name\":\"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SBAC-PAD49847.2020.00046\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD49847.2020.00046","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

在云计算中，弹性被定义为“系统能够通过以自主方式提供和取消资源来适应工作负载变化的程度，以便在每个时间点可用资源尽可能与当前需求相匹配”。即使我们在当今的云环境中部署HPC系统，为HPC(高性能计算)集群管理系统增加弹性仍然具有挑战性。造成这种困难的原因是HPC作业调度器需要依赖一组固定的资源。每次拓扑更改(添加或删除计算资源)都会导致HPC作业调度器的全局重新启动。这种现象并不是一个主要的缺点，因为它提供了一种非常有效的方式来共享一套固定的资源，但我们认为可以用一种更有弹性的方法来补充它。此外，不应将弹性问题简化为资源的规模问题。云还支持访问各种技术，以增强向用户提供的服务。在本文中，我们的方法是使用容器技术实例化一个基于用户保留约束的定制HPC环境。我们声称在HPC作业调度器中引入和使用容器可以以更经济的方式更好地管理资源。从SLURM的用例来看，我们发布了一种HPC作业调度器的“容器化”方法，这种方法在任何层的作业调度器中都很普遍。我们还提供了初步实验，证明我们的容器化SLURM系统是可行的和有前途的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Pervasive Containerization of HPC Job Schedulers

In cloud computing, elasticity is defined as "the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible". Adding elasticity to HPC (High Performance Computing) clusters management systems remains challenging even if we deploy such HPC systems in today's cloud environments. This difficulty is caused by the fact that HPC jobs scheduler needs to rely on a fixed set of resources. Every change of topology (adding or removing computing resources) leads to a global restart of the HPC jobs scheduler. This phenomenon is not a major drawback because it provides a very effective way of sharing a fixed set of resources but we think that it could be complemented by a more elastic approach. Moreover, the elasticity issue should not be reduced to the scaling of resources issues. Clouds also enable access to various technologies that enhance the services offer to users. In this paper, our approach is to use containers technology to instantiate a tailored HPC environment based on the user's reservation constraints. We claim that the introduction and use of containers in HPC job schedulers allow better management of resources, in a more economical way. From the use case of SLURM, we release a methodology for 'containerization' of HPC jobs schedulers which is pervasive i.e. spreading widely throughout any layers of job schedulers. We also provide initial experiments demonstrating that our containerized SLURM system is operational and promising.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

自引率

0.00%

发文量