高效的组调度实现

Proceedings of the IEEE/ACM SC98 Conference Pub Date : 1998-11-07 DOI:10.1109/SC.1998.10007

A. Hori, H. Tezuka, Y. Ishikawa

{"title":"高效的组调度实现","authors":"A. Hori, H. Tezuka, Y. Ishikawa","doi":"10.1109/SC.1998.10007","DOIUrl":null,"url":null,"abstract":"A new and more highly efficient gang scheduling implementation technique is the basis for this paper. Network preemption, in which network interface contexts are saved and restored, has already been proposed to enable parallel applications to perform efficent user-level communication. This network preemption technique can be used to for detecting global state, such as deadlock, of a parallel program execution. A gang scheduler, SCore-D, using the network preemption technique is implemented with PM, a user-level communication library. This paper evaluates network preemption gang scheduling overhead using eight NAS parallel benchmark programs. The results of this evaluation illustrate that the saving and restoring network contexts occupies almost half of the total gang scheduling overhead. A new mechanism, having multiple network contexts and merely switching the context pointers without saving and restoring the network contexts, is proposed. The NAS parallel benchmark evaluation shows that gang scheduling overhead is almost halved. The maximum gang scheduling overhead among benchmark programs is less than 10%, with a 40msec time slice on 64 single-way PentiumPros, connected by Myrinet to form a PC cluster. The numbers of secondary cache misses are counted, and it is found that network preemption with multiple network contexts is more cache-effective than a single network context. The observed scheduling overhead for applications running on 64 nodes can only be a small percent of the execution time. The gang scheduling overheads of switching two NAS parallel benchmark programs are also evaluated. The additional overheads are less than 2% in most cases, with a 100msec time slice on 64 nodes. This slightly higher scheduling overheads than for switching a single parallel process comes from more frequent cache misses. This paper contributes the following findings; i) gang scheduling overhead with network preemption can be sufficiently low, ii) proposed network preemption with multiple network contexts is more cache-effective than a single network context, and, iii) network preemption can be applied to detect global states of user parallel processes. SCore-D gang scheduler realized by network preemption can utilize processor resources by the detecting the global state of user parallel processes. Network preemption with multiple contexts exhibits highly efficient gang scheduling. The combination of low scheduling overhead and the global state detection mechanism achieves an interactive parallel programming where parallel program development and the production run of parallel programs can be mixed freely.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":"{\"title\":\"Highly Efficient Gang Scheduling Implementation\",\"authors\":\"A. Hori, H. Tezuka, Y. Ishikawa\",\"doi\":\"10.1109/SC.1998.10007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A new and more highly efficient gang scheduling implementation technique is the basis for this paper. Network preemption, in which network interface contexts are saved and restored, has already been proposed to enable parallel applications to perform efficent user-level communication. This network preemption technique can be used to for detecting global state, such as deadlock, of a parallel program execution. A gang scheduler, SCore-D, using the network preemption technique is implemented with PM, a user-level communication library. This paper evaluates network preemption gang scheduling overhead using eight NAS parallel benchmark programs. The results of this evaluation illustrate that the saving and restoring network contexts occupies almost half of the total gang scheduling overhead. A new mechanism, having multiple network contexts and merely switching the context pointers without saving and restoring the network contexts, is proposed. The NAS parallel benchmark evaluation shows that gang scheduling overhead is almost halved. The maximum gang scheduling overhead among benchmark programs is less than 10%, with a 40msec time slice on 64 single-way PentiumPros, connected by Myrinet to form a PC cluster. The numbers of secondary cache misses are counted, and it is found that network preemption with multiple network contexts is more cache-effective than a single network context. The observed scheduling overhead for applications running on 64 nodes can only be a small percent of the execution time. The gang scheduling overheads of switching two NAS parallel benchmark programs are also evaluated. The additional overheads are less than 2% in most cases, with a 100msec time slice on 64 nodes. This slightly higher scheduling overheads than for switching a single parallel process comes from more frequent cache misses. This paper contributes the following findings; i) gang scheduling overhead with network preemption can be sufficiently low, ii) proposed network preemption with multiple network contexts is more cache-effective than a single network context, and, iii) network preemption can be applied to detect global states of user parallel processes. SCore-D gang scheduler realized by network preemption can utilize processor resources by the detecting the global state of user parallel processes. Network preemption with multiple contexts exhibits highly efficient gang scheduling. The combination of low scheduling overhead and the global state detection mechanism achieves an interactive parallel programming where parallel program development and the production run of parallel programs can be mixed freely.\",\"PeriodicalId\":113978,\"journal\":{\"name\":\"Proceedings of the IEEE/ACM SC98 Conference\",\"volume\":\"86 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1998-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"37\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE/ACM SC98 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.1998.10007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC98 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.1998.10007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

摘要

一种新的更高效的队列调度实现技术是本文研究的基础。网络抢占，其中网络接口上下文被保存和恢复，已经被提出使并行应用程序执行有效的用户级通信。这种网络抢占技术可用于检测并行程序执行的全局状态(如死锁)。使用网络抢占技术的组调度程序SCore-D使用PM实现，PM是一个用户级通信库。本文利用8个NAS并行基准程序对网络抢占组调度开销进行了评估。该评估结果表明，保存和恢复网络上下文几乎占用了总组调度开销的一半。提出了一种具有多个网络上下文、仅切换上下文指针而不保存和恢复网络上下文的新机制。NAS并行基准评估表明，组调度开销几乎减半。基准程序之间的最大组调度开销小于10%，在64个单路pentiumpro上使用40ms的时间片，通过Myrinet连接形成PC集群。计算了二级缓存丢失的次数，发现具有多个网络上下文的网络抢占比单个网络上下文的网络抢占具有更高的缓存效率。在64个节点上运行的应用程序的调度开销可能只占执行时间的很小一部分。本文还对切换两个NAS并行基准程序的组调度开销进行了评估。在大多数情况下，额外开销不到2%，64个节点上的时间片为100毫秒。这比切换单个并行进程稍微高一些的调度开销来自更频繁的缓存丢失。本文的研究成果如下:I)网络抢占的组调度开销可以足够低，ii)多网络上下文的网络抢占比单一网络上下文的缓存效率更高，iii)网络抢占可以应用于检测用户并行进程的全局状态。利用网络抢占实现的SCore-D组调度程序可以通过检测用户并行进程的全局状态来充分利用处理器资源。多环境下的网络抢占表现出高效率的组调度。低调度开销与全局状态检测机制相结合，实现了并行程序开发与并行程序生产运行可以自由混合的交互式并行编程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Highly Efficient Gang Scheduling Implementation

A new and more highly efficient gang scheduling implementation technique is the basis for this paper. Network preemption, in which network interface contexts are saved and restored, has already been proposed to enable parallel applications to perform efficent user-level communication. This network preemption technique can be used to for detecting global state, such as deadlock, of a parallel program execution. A gang scheduler, SCore-D, using the network preemption technique is implemented with PM, a user-level communication library. This paper evaluates network preemption gang scheduling overhead using eight NAS parallel benchmark programs. The results of this evaluation illustrate that the saving and restoring network contexts occupies almost half of the total gang scheduling overhead. A new mechanism, having multiple network contexts and merely switching the context pointers without saving and restoring the network contexts, is proposed. The NAS parallel benchmark evaluation shows that gang scheduling overhead is almost halved. The maximum gang scheduling overhead among benchmark programs is less than 10%, with a 40msec time slice on 64 single-way PentiumPros, connected by Myrinet to form a PC cluster. The numbers of secondary cache misses are counted, and it is found that network preemption with multiple network contexts is more cache-effective than a single network context. The observed scheduling overhead for applications running on 64 nodes can only be a small percent of the execution time. The gang scheduling overheads of switching two NAS parallel benchmark programs are also evaluated. The additional overheads are less than 2% in most cases, with a 100msec time slice on 64 nodes. This slightly higher scheduling overheads than for switching a single parallel process comes from more frequent cache misses. This paper contributes the following findings; i) gang scheduling overhead with network preemption can be sufficiently low, ii) proposed network preemption with multiple network contexts is more cache-effective than a single network context, and, iii) network preemption can be applied to detect global states of user parallel processes. SCore-D gang scheduler realized by network preemption can utilize processor resources by the detecting the global state of user parallel processes. Network preemption with multiple contexts exhibits highly efficient gang scheduling. The combination of low scheduling overhead and the global state detection mechanism achieves an interactive parallel programming where parallel program development and the production run of parallel programs can be mixed freely.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the IEEE/ACM SC98 Conference

自引率

0.00%

发文量