灵活的CPU-GPU异构集群调度框架

2014 21st International Conference on High Performance Computing (HiPC) Pub Date : 2014-12-01 DOI:10.1109/HiPC.2014.7116892

Kittisak Sajjapongse, T. Agarwal, M. Becchi

{"title":"灵活的CPU-GPU异构集群调度框架","authors":"Kittisak Sajjapongse, T. Agarwal, M. Becchi","doi":"10.1109/HiPC.2014.7116892","DOIUrl":null,"url":null,"abstract":"In the last few years, thanks to their computational power and progressively increased programmability, GPUs have become part of HPC clusters. As a result, widely used open-source cluster resource managers (e.g. TORQUE and SLURM) have recently been extended with GPU support capabilities. These systems, however, treat GPUs as dedicated resources and provide scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. We propose a cluster-level scheduler and integrate it with our previously proposed node-level GPU virtualization runtime [1, 2], thus providing a hierarchical cluster resource management framework that allows the efficient use of heterogeneous CPU-GPU clusters. The scheduling policy used by our system is configurable, and our scheduler provides administrators with a high-level API that allows easily defining custom scheduling policies. We provide two application- and hardware-heterogeneity-aware cluster-level scheduling schemes for hybrid MPI-CUDA applications: co-location- and latency-reduction-based scheduling, and use them in combination with a preemption-based GPU sharing policy implemented at the node-level. We validate our framework on two heterogeneous clusters: one consisting of commodity workstations and the other of high-end nodes with various hardware configurations, and on a mix of communication- and compute-intensive applications. Our experiments show that, by better utilizing the available resources, our scheduling framework outperforms existing batch-schedulers both in terms of throughput and application latency.","PeriodicalId":337777,"journal":{"name":"2014 21st International Conference on High Performance Computing (HiPC)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A flexible scheduling framework for heterogeneous CPU-GPU clusters\",\"authors\":\"Kittisak Sajjapongse, T. Agarwal, M. Becchi\",\"doi\":\"10.1109/HiPC.2014.7116892\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the last few years, thanks to their computational power and progressively increased programmability, GPUs have become part of HPC clusters. As a result, widely used open-source cluster resource managers (e.g. TORQUE and SLURM) have recently been extended with GPU support capabilities. These systems, however, treat GPUs as dedicated resources and provide scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. We propose a cluster-level scheduler and integrate it with our previously proposed node-level GPU virtualization runtime [1, 2], thus providing a hierarchical cluster resource management framework that allows the efficient use of heterogeneous CPU-GPU clusters. The scheduling policy used by our system is configurable, and our scheduler provides administrators with a high-level API that allows easily defining custom scheduling policies. We provide two application- and hardware-heterogeneity-aware cluster-level scheduling schemes for hybrid MPI-CUDA applications: co-location- and latency-reduction-based scheduling, and use them in combination with a preemption-based GPU sharing policy implemented at the node-level. We validate our framework on two heterogeneous clusters: one consisting of commodity workstations and the other of high-end nodes with various hardware configurations, and on a mix of communication- and compute-intensive applications. Our experiments show that, by better utilizing the available resources, our scheduling framework outperforms existing batch-schedulers both in terms of throughput and application latency.\",\"PeriodicalId\":337777,\"journal\":{\"name\":\"2014 21st International Conference on High Performance Computing (HiPC)\",\"volume\":\"127 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 21st International Conference on High Performance Computing (HiPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2014.7116892\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 21st International Conference on High Performance Computing (HiPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2014.7116892","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

在过去的几年里，由于它们的计算能力和逐渐增加的可编程性，gpu已经成为高性能计算集群的一部分。因此，广泛使用的开源集群资源管理器(例如TORQUE和SLURM)最近已经扩展了GPU支持功能。然而，这些系统将gpu视为专用资源，并提供调度机制，这通常会导致资源利用率不足，从而导致性能不佳。我们提出了一个集群级调度器，并将其与我们之前提出的节点级GPU虚拟化运行时集成在一起[1,2]，从而提供了一个分层集群资源管理框架，允许高效使用异构CPU-GPU集群。我们的系统使用的调度策略是可配置的，我们的调度程序为管理员提供了一个高级API，允许轻松定义自定义调度策略。我们为混合MPI-CUDA应用程序提供了两种应用程序和硬件异构感知集群级调度方案:基于协同定位和基于延迟减少的调度，并将它们与基于抢占的GPU共享策略结合使用，该策略在节点级实现。我们在两个异构集群上验证我们的框架:一个由普通工作站组成，另一个由具有各种硬件配置的高端节点组成，并且在通信和计算密集型应用程序的混合上。我们的实验表明，通过更好地利用可用资源，我们的调度框架在吞吐量和应用程序延迟方面都优于现有的批调度程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A flexible scheduling framework for heterogeneous CPU-GPU clusters

In the last few years, thanks to their computational power and progressively increased programmability, GPUs have become part of HPC clusters. As a result, widely used open-source cluster resource managers (e.g. TORQUE and SLURM) have recently been extended with GPU support capabilities. These systems, however, treat GPUs as dedicated resources and provide scheduling mechanisms that often result in resource underutilization and, thereby, in suboptimal performance. We propose a cluster-level scheduler and integrate it with our previously proposed node-level GPU virtualization runtime [1, 2], thus providing a hierarchical cluster resource management framework that allows the efficient use of heterogeneous CPU-GPU clusters. The scheduling policy used by our system is configurable, and our scheduler provides administrators with a high-level API that allows easily defining custom scheduling policies. We provide two application- and hardware-heterogeneity-aware cluster-level scheduling schemes for hybrid MPI-CUDA applications: co-location- and latency-reduction-based scheduling, and use them in combination with a preemption-based GPU sharing policy implemented at the node-level. We validate our framework on two heterogeneous clusters: one consisting of commodity workstations and the other of high-end nodes with various hardware configurations, and on a mix of communication- and compute-intensive applications. Our experiments show that, by better utilizing the available resources, our scheduling framework outperforms existing batch-schedulers both in terms of throughput and application latency.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 21st International Conference on High Performance Computing (HiPC)

自引率

0.00%

发文量