A comparison of local and gang scheduling on a Beowulf cluster

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935) Pub Date : 2004-09-20 DOI:10.1109/CLUSTR.2004.1392601

P. Strazdins, Johannes Uhlmann

{"title":"A comparison of local and gang scheduling on a Beowulf cluster","authors":"P. Strazdins, Johannes Uhlmann","doi":"10.1109/CLUSTR.2004.1392601","DOIUrl":null,"url":null,"abstract":"Gang scheduling and related techniques are widely believed to be necessary for efficient job scheduling on distributed memory parallel computers. This is because they minimize context switching overheads and permit the parallel job currently running to progress at the fastest possible rate. However, in the case of cluster computers, and particularly those with COTS networks, these benefits can be outweighed in the multiple jobs time-sharing context by the loss the ability to utilize the CPU for other jobs when the current job is waiting for messages. Experiments on a Linux Beowulf cluster with 100 Mb fast Ethernet switches are made comparing the SCore buddy-based gang scheduling with local scheduling (provided by the Linux 2.4 kernel with MPI implemented over TCP/IP). Results for communication-intensive numerical applications on 16 nodes reveal that gang scheduling results in 'slowdowns ' up to a factor of two greater for 8 simultaneous jobs. This phenomenon is not due to any deficiencies in SCore but due to the relative costs of context switching versus message overhead, and we expect similar results holds for any gang scheduling implementation. A performance analysis of local scheduling indicates that cache pollution due to context switching is more significant than the direct context switching overhead on the applications studied. When this is taken into account, local scheduling behaviour comes close to achieving ideal slowdowns for finer-grained computations such as Linpack. The performance models also indicate that similar trends are to be expected for clusters with faster networks.","PeriodicalId":123512,"journal":{"name":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTR.2004.1392601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 26

Abstract

Gang scheduling and related techniques are widely believed to be necessary for efficient job scheduling on distributed memory parallel computers. This is because they minimize context switching overheads and permit the parallel job currently running to progress at the fastest possible rate. However, in the case of cluster computers, and particularly those with COTS networks, these benefits can be outweighed in the multiple jobs time-sharing context by the loss the ability to utilize the CPU for other jobs when the current job is waiting for messages. Experiments on a Linux Beowulf cluster with 100 Mb fast Ethernet switches are made comparing the SCore buddy-based gang scheduling with local scheduling (provided by the Linux 2.4 kernel with MPI implemented over TCP/IP). Results for communication-intensive numerical applications on 16 nodes reveal that gang scheduling results in 'slowdowns ' up to a factor of two greater for 8 simultaneous jobs. This phenomenon is not due to any deficiencies in SCore but due to the relative costs of context switching versus message overhead, and we expect similar results holds for any gang scheduling implementation. A performance analysis of local scheduling indicates that cache pollution due to context switching is more significant than the direct context switching overhead on the applications studied. When this is taken into account, local scheduling behaviour comes close to achieving ideal slowdowns for finer-grained computations such as Linpack. The performance models also indicate that similar trends are to be expected for clusters with faster networks.

查看原文本刊更多论文

贝奥武夫集群的本地调度与组调度比较

群调度及其相关技术被广泛认为是高效调度分布式存储并行计算机作业的必要条件。这是因为它们最大限度地减少了上下文切换开销，并允许当前运行的并行作业以尽可能快的速度进行。然而，在集群计算机的情况下，特别是那些使用COTS网络的情况下，在多作业分时上下文中，由于当前作业正在等待消息时无法将CPU用于其他作业，因此这些好处可能会被抵消。在一个具有100 Mb快速以太网交换机的Linux Beowulf集群上进行了实验，比较了基于SCore伙伴的队列调度和本地调度(由Linux 2.4内核提供，通过TCP/IP实现MPI)。对16个节点上的通信密集型数值应用程序的结果表明，对于8个同时进行的作业，组调度导致的“减速”高达两倍。这种现象不是由于SCore的任何缺陷，而是由于上下文切换与消息开销的相对成本，我们预计任何组调度实现都会出现类似的结果。对本地调度的性能分析表明，在所研究的应用程序中，由于上下文切换造成的缓存污染比直接上下文切换带来的开销更为显著。当考虑到这一点时，本地调度行为接近于实现诸如Linpack等细粒度计算的理想减速。性能模型还表明，对于具有更快网络的集群，也会出现类似的趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2004 IEEE International Conference on Cluster Computing (IEEE Cat. No.04EX935)

自引率

0.00%

发文量