{"title":"Implementation of a global GPU management plugin for Slurm","authors":"Xue Wu, Xiang Long","doi":"10.1109/CIACT.2017.7977294","DOIUrl":null,"url":null,"abstract":"Slurm is a widely used resource management software for Linux cluster. It has several CPU selection plugins with different allocation strategies suitable for different scenarios. But the GPU allocation is constrained by the selected CPU's location because GPUs can only be accessed by the process running on the same node. This restriction may cause job waiting for GPUs even if there are some free GPUs in the cluster. This paper presents a global GPU management plugin for Slurm. The plugin using remote GPU virtualization method detaches the GPUs to form a global GPU pool and decouples the GPU allocation procedure from the CPU's. GPUs in the pool are available to CUDA jobs on any node in the cluster. Furthermore, we implement two GPU selection strategy, best fit and local first. Experiments show the global GPU management plugin shorter the job's waiting time and makes efficient use of GPUs in the cluster.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Slurm is a widely used resource management software for Linux cluster. It has several CPU selection plugins with different allocation strategies suitable for different scenarios. But the GPU allocation is constrained by the selected CPU's location because GPUs can only be accessed by the process running on the same node. This restriction may cause job waiting for GPUs even if there are some free GPUs in the cluster. This paper presents a global GPU management plugin for Slurm. The plugin using remote GPU virtualization method detaches the GPUs to form a global GPU pool and decouples the GPU allocation procedure from the CPU's. GPUs in the pool are available to CUDA jobs on any node in the cluster. Furthermore, we implement two GPU selection strategy, best fit and local first. Experiments show the global GPU management plugin shorter the job's waiting time and makes efficient use of GPUs in the cluster.