Claudia Misale, M. Drocco, Daniel J. Milroy, Carlos Eduardo Arango Gutierrez, Stephen Herbein, D. Ahn, Yoonho Park
{"title":"It's a Scheduling Affair: GROMACS in the Cloud with the KubeFlux Scheduler","authors":"Claudia Misale, M. Drocco, Daniel J. Milroy, Carlos Eduardo Arango Gutierrez, Stephen Herbein, D. Ahn, Yoonho Park","doi":"10.1109/CANOPIEHPC54579.2021.00006","DOIUrl":null,"url":null,"abstract":"In this work, we address the problem of running HPC workloads efficiently on Kubernetes clusters. To do so, we compare the Kubernetes' default scheduler with KubeFlux, a Kubernetes plug-in scheduler built on the Flux graph-based scheduler, on a 34- node Red Hat OpenShift cluster on IBM Cloud. We detail how scheduling can affect the performance of GROMACS, a well-known HPC application, and we show that KubeFlux can improve its performance through better pod scheduling. In our tests, KubeFlux demonstrates the tendency to limit the number of subnets spanned by a job and the maximum number of pods per node, translating to a > 2x speedup over the Kubernetes default scheduler in several cases.","PeriodicalId":237957,"journal":{"name":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 3rd International Workshop on Containers and New Orchestration Paradigms for Isolated Environments in HPC (CANOPIE-HPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CANOPIEHPC54579.2021.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In this work, we address the problem of running HPC workloads efficiently on Kubernetes clusters. To do so, we compare the Kubernetes' default scheduler with KubeFlux, a Kubernetes plug-in scheduler built on the Flux graph-based scheduler, on a 34- node Red Hat OpenShift cluster on IBM Cloud. We detail how scheduling can affect the performance of GROMACS, a well-known HPC application, and we show that KubeFlux can improve its performance through better pod scheduling. In our tests, KubeFlux demonstrates the tendency to limit the number of subnets spanned by a job and the maximum number of pods per node, translating to a > 2x speedup over the Kubernetes default scheduler in several cases.
在这项工作中,我们解决了在Kubernetes集群上高效运行HPC工作负载的问题。为此,我们将Kubernetes的默认调度器与Kubernetes的插件调度器KubeFlux进行了比较,KubeFlux是基于Flux图形调度器构建的Kubernetes插件调度器,位于IBM Cloud上的34个节点的Red Hat OpenShift集群上。我们详细介绍了调度如何影响GROMACS(一个著名的HPC应用程序)的性能,并展示了KubeFlux可以通过更好的pod调度来提高其性能。在我们的测试中,KubeFlux展示了限制作业跨越的子网数量和每个节点的最大pod数量的趋势,在一些情况下,与Kubernetes默认调度器相比,它的速度提高了100倍。