Symphony:基于协处理器的异构集群上客户机-服务器应用程序的调度器

2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI:10.1109/CLUSTER.2011.46

M. M. Rafique, S. Cadambi, Kunal Rao, A. Butt, S. Chakradhar

{"title":"Symphony:基于协处理器的异构集群上客户机-服务器应用程序的调度器","authors":"M. M. Rafique, S. Cadambi, Kunal Rao, A. Butt, S. Chakradhar","doi":"10.1109/CLUSTER.2011.46","DOIUrl":null,"url":null,"abstract":"Coprocessors such as GPUs are increasingly being deployed in clusters to process scientific and compute-intensive jobs. In this work, we study if GPU-based heterogeneous clusters can benefit client-server applications. Specifically, we consider the practical situation where multiple client-server applications share a heterogeneous cluster (multi-tenancy), and experience unpredictable variations in incoming client request rates, including steep load spikes. Even for \"compute-intensive\" client-server applications, it is unclear if a GPU-based cluster can seamlessly deliver acceptable response times in the presence of multi-tenancy and load spikes. We argue that a cluster-level scheduler that is aware of application load, request deadlines and the heterogeneity is necessary in this situation. We propose a novel scheduler called Symphony that enables efficient, dynamic sharing of a GPU-based heterogeneous cluster across multiple concurrently-executing client-server applications, each with arbitrary load spikes. Symphony performs three key tasks: it (i) monitors the load on each application, (ii) collects past performance data and dynamically builds simple performance models of available processing resources and (iii) computes a priority for pending requests based on the above parameters and the requests' slack. Based on this, it reorders client requests across different applications to achieve acceptable response times. We also define how client-server applications should interact with a scheduler such as Symphony, and develop an API to this end. We deploy Symphony as user-space middleware on a high-end heterogeneous cluster with dual quad-core Xeon CPUs and dual NVIDIA Fermi GPUs. An evaluation using representative applications shows that in the presence of load spikes (i) Symphony incurs 2-20x fewer requests that do not meet response time constraints compared with other schedulers, and (ii) in order to achieve the same performance as Symphony, other schedulers need 2x more cluster nodes.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Symphony: A Scheduler for Client-Server Applications on Coprocessor-Based Heterogeneous Clusters\",\"authors\":\"M. M. Rafique, S. Cadambi, Kunal Rao, A. Butt, S. Chakradhar\",\"doi\":\"10.1109/CLUSTER.2011.46\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coprocessors such as GPUs are increasingly being deployed in clusters to process scientific and compute-intensive jobs. In this work, we study if GPU-based heterogeneous clusters can benefit client-server applications. Specifically, we consider the practical situation where multiple client-server applications share a heterogeneous cluster (multi-tenancy), and experience unpredictable variations in incoming client request rates, including steep load spikes. Even for \\\"compute-intensive\\\" client-server applications, it is unclear if a GPU-based cluster can seamlessly deliver acceptable response times in the presence of multi-tenancy and load spikes. We argue that a cluster-level scheduler that is aware of application load, request deadlines and the heterogeneity is necessary in this situation. We propose a novel scheduler called Symphony that enables efficient, dynamic sharing of a GPU-based heterogeneous cluster across multiple concurrently-executing client-server applications, each with arbitrary load spikes. Symphony performs three key tasks: it (i) monitors the load on each application, (ii) collects past performance data and dynamically builds simple performance models of available processing resources and (iii) computes a priority for pending requests based on the above parameters and the requests' slack. Based on this, it reorders client requests across different applications to achieve acceptable response times. We also define how client-server applications should interact with a scheduler such as Symphony, and develop an API to this end. We deploy Symphony as user-space middleware on a high-end heterogeneous cluster with dual quad-core Xeon CPUs and dual NVIDIA Fermi GPUs. An evaluation using representative applications shows that in the presence of load spikes (i) Symphony incurs 2-20x fewer requests that do not meet response time constraints compared with other schedulers, and (ii) in order to achieve the same performance as Symphony, other schedulers need 2x more cluster nodes.\",\"PeriodicalId\":200830,\"journal\":{\"name\":\"2011 IEEE International Conference on Cluster Computing\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTER.2011.46\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.46","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

像gpu这样的协处理器正越来越多地部署在集群中，以处理科学和计算密集型的工作。在这项工作中，我们研究基于gpu的异构集群是否可以使客户机-服务器应用程序受益。具体地说，我们考虑了多个客户机-服务器应用程序共享异构集群(多租户)的实际情况，并经历了传入客户机请求率的不可预测变化，包括急剧的负载峰值。即使对于“计算密集型”客户机-服务器应用程序，也不清楚基于gpu的集群是否能够在多租户和负载峰值存在的情况下无缝地提供可接受的响应时间。我们认为，在这种情况下，需要一个能够感知应用程序负载、请求截止日期和异质性的集群级调度器。我们提出了一种名为Symphony的新型调度器，它支持跨多个并发执行的客户机-服务器应用程序(每个应用程序都具有任意负载峰值)高效、动态地共享基于gpu的异构集群。Symphony执行三个关键任务:它(i)监控每个应用程序的负载，(ii)收集过去的性能数据并动态构建可用处理资源的简单性能模型，以及(iii)基于上述参数和请求的空闲计算待处理请求的优先级。在此基础上，它跨不同应用程序重新排序客户机请求，以实现可接受的响应时间。我们还定义了客户机-服务器应用程序应该如何与调度器(如Symphony)交互，并为此开发了一个API。我们将Symphony部署为用户空间中间件，部署在具有双四核至强cpu和双NVIDIA Fermi gpu的高端异构集群上。使用代表性应用程序的评估表明，在存在负载峰值的情况下(i)与其他调度器相比，Symphony产生的不满足响应时间限制的请求减少了2-20倍;(ii)为了达到与Symphony相同的性能，其他调度器需要多2倍的集群节点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Symphony: A Scheduler for Client-Server Applications on Coprocessor-Based Heterogeneous Clusters

Coprocessors such as GPUs are increasingly being deployed in clusters to process scientific and compute-intensive jobs. In this work, we study if GPU-based heterogeneous clusters can benefit client-server applications. Specifically, we consider the practical situation where multiple client-server applications share a heterogeneous cluster (multi-tenancy), and experience unpredictable variations in incoming client request rates, including steep load spikes. Even for "compute-intensive" client-server applications, it is unclear if a GPU-based cluster can seamlessly deliver acceptable response times in the presence of multi-tenancy and load spikes. We argue that a cluster-level scheduler that is aware of application load, request deadlines and the heterogeneity is necessary in this situation. We propose a novel scheduler called Symphony that enables efficient, dynamic sharing of a GPU-based heterogeneous cluster across multiple concurrently-executing client-server applications, each with arbitrary load spikes. Symphony performs three key tasks: it (i) monitors the load on each application, (ii) collects past performance data and dynamically builds simple performance models of available processing resources and (iii) computes a priority for pending requests based on the above parameters and the requests' slack. Based on this, it reorders client requests across different applications to achieve acceptable response times. We also define how client-server applications should interact with a scheduler such as Symphony, and develop an API to this end. We deploy Symphony as user-space middleware on a high-end heterogeneous cluster with dual quad-core Xeon CPUs and dual NVIDIA Fermi GPUs. An evaluation using representative applications shows that in the presence of load spikes (i) Symphony incurs 2-20x fewer requests that do not meet response time constraints compared with other schedulers, and (ii) in order to achieve the same performance as Symphony, other schedulers need 2x more cluster nodes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量