基于gpu的异构集群干扰驱动资源管理

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI:10.1145/2287076.2287091

R. Phull, Cheng-Hong Li, Kunal Rao, S. Cadambi, S. Chakradhar

{"title":"基于gpu的异构集群干扰驱动资源管理","authors":"R. Phull, Cheng-Hong Li, Kunal Rao, S. Cadambi, S. Chakradhar","doi":"10.1145/2287076.2287091","DOIUrl":null,"url":null,"abstract":"GPU-based clusters are increasingly being deployed in HPC environments to accelerate a variety of scientific applications. Despite their growing popularity, the GPU devices themselves are under-utilized even for many computationally-intensive jobs. This stems from the fact that the typical GPU usage model is one in which a host processor periodically offloads computationally intensive portions of an application to the coprocessor. Since some portions of code cannot be offloaded to the GPU (for example, code performing network communication in MPI applications), this usage model results in periods of time when the GPU is idle. GPUs could be time-shared across jobs to \"fill\" these idle periods, but unlike CPU resources such as the cache, the effects of sharing the GPU are not well understood. Specifically, two jobs that time-share a single GPU will experience resource contention and interfere with each other. The resulting slow-down could lead to missed job deadlines. Current cluster managers do not support GPU-sharing, but instead dedicate GPUs to a job for the job's lifetime.\n In this paper, we present a framework to predict and handle interference when two or more jobs time-share GPUs in HPC clusters. Our framework consists of an analysis model, and a dynamic interference detection and response mechanism to detect excessive interference and restart the interfering jobs on different nodes. We implement our framework in Torque, an open-source cluster manager, and using real workloads on an HPC cluster, show that interference-aware two-job colocation (although our method is applicable to colocating more than two jobs) improves GPU utilization by 25%, reduces a job's waiting time in the queue by 39% and improves job latencies by around 20%.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"738 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Interference-driven resource management for GPU-based heterogeneous clusters\",\"authors\":\"R. Phull, Cheng-Hong Li, Kunal Rao, S. Cadambi, S. Chakradhar\",\"doi\":\"10.1145/2287076.2287091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPU-based clusters are increasingly being deployed in HPC environments to accelerate a variety of scientific applications. Despite their growing popularity, the GPU devices themselves are under-utilized even for many computationally-intensive jobs. This stems from the fact that the typical GPU usage model is one in which a host processor periodically offloads computationally intensive portions of an application to the coprocessor. Since some portions of code cannot be offloaded to the GPU (for example, code performing network communication in MPI applications), this usage model results in periods of time when the GPU is idle. GPUs could be time-shared across jobs to \\\"fill\\\" these idle periods, but unlike CPU resources such as the cache, the effects of sharing the GPU are not well understood. Specifically, two jobs that time-share a single GPU will experience resource contention and interfere with each other. The resulting slow-down could lead to missed job deadlines. Current cluster managers do not support GPU-sharing, but instead dedicate GPUs to a job for the job's lifetime.\\n In this paper, we present a framework to predict and handle interference when two or more jobs time-share GPUs in HPC clusters. Our framework consists of an analysis model, and a dynamic interference detection and response mechanism to detect excessive interference and restart the interfering jobs on different nodes. We implement our framework in Torque, an open-source cluster manager, and using real workloads on an HPC cluster, show that interference-aware two-job colocation (although our method is applicable to colocating more than two jobs) improves GPU utilization by 25%, reduces a job's waiting time in the queue by 39% and improves job latencies by around 20%.\",\"PeriodicalId\":330072,\"journal\":{\"name\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"volume\":\"738 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Symposium on High-Performance Parallel Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2287076.2287091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2287076.2287091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

基于gpu的集群越来越多地部署在高性能计算环境中，以加速各种科学应用。尽管它们越来越受欢迎，GPU设备本身甚至在许多计算密集型工作中也没有得到充分利用。这源于这样一个事实，即典型的GPU使用模型是一个主机处理器周期性地将应用程序的计算密集型部分卸载到协处理器。由于代码的某些部分不能卸载到GPU(例如，在MPI应用程序中执行网络通信的代码)，这种使用模型会导致GPU空闲的一段时间。GPU可以在作业之间分时共享以“填充”这些空闲时间，但与CPU资源(如缓存)不同，共享GPU的效果还没有得到很好的理解。具体来说，两个作业共用一个GPU，会出现资源争用和相互干扰的情况。由此导致的慢下来可能会导致错过工作截止日期。当前的集群管理器不支持gpu共享，而是在作业的生命周期内将gpu专用于作业。在本文中，我们提出了一个框架来预测和处理当两个或多个作业共享gpu在HPC集群中的干扰。该框架包括一个分析模型和一个动态干扰检测和响应机制，用于检测过多的干扰并重新启动不同节点上的干扰作业。我们在Torque(一个开源集群管理器)中实现了我们的框架，并在HPC集群上使用了实际工作负载，结果表明，干扰感知的双作业托管(尽管我们的方法适用于并发两个以上的作业)将GPU利用率提高了25%，将作业在队列中的等待时间减少了39%，并将作业延迟提高了20%左右。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interference-driven resource management for GPU-based heterogeneous clusters

GPU-based clusters are increasingly being deployed in HPC environments to accelerate a variety of scientific applications. Despite their growing popularity, the GPU devices themselves are under-utilized even for many computationally-intensive jobs. This stems from the fact that the typical GPU usage model is one in which a host processor periodically offloads computationally intensive portions of an application to the coprocessor. Since some portions of code cannot be offloaded to the GPU (for example, code performing network communication in MPI applications), this usage model results in periods of time when the GPU is idle. GPUs could be time-shared across jobs to "fill" these idle periods, but unlike CPU resources such as the cache, the effects of sharing the GPU are not well understood. Specifically, two jobs that time-share a single GPU will experience resource contention and interfere with each other. The resulting slow-down could lead to missed job deadlines. Current cluster managers do not support GPU-sharing, but instead dedicate GPUs to a job for the job's lifetime. In this paper, we present a framework to predict and handle interference when two or more jobs time-share GPUs in HPC clusters. Our framework consists of an analysis model, and a dynamic interference detection and response mechanism to detect excessive interference and restart the interfering jobs on different nodes. We implement our framework in Torque, an open-source cluster manager, and using real workloads on an HPC cluster, show that interference-aware two-job colocation (although our method is applicable to colocating more than two jobs) improves GPU utilization by 25%, reduces a job's waiting time in the queue by 39% and improves job latencies by around 20%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量