Yanyong Zhang, A. Sivasubramaniam, H. Franke, J. Moreira
{"title":"Improving parallel job scheduling by combining gang scheduling and backfilling techniques","authors":"Yanyong Zhang, A. Sivasubramaniam, H. Franke, J. Moreira","doi":"10.1109/IPDPS.2000.845975","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845975","url":null,"abstract":"Two different approaches have been commonly used to address problems associated with space sharing scheduling strategies: (a) augmenting space sharing with backfilling, which performs out of order job scheduling; and (b) augmenting space sharing with time sharing, using a technique called coscheduling or gang scheduling. With three important experimental results-impact of priority queue order on backfilling, impact of overestimation of job execution times, and comparison of scheduling techniques-this paper presents an integrated strategy that combines backfilling with gang scheduling. Using extensive simulations based on detailed models of realistic workloads, the benefits of combining backfilling and gang scheduling are clearly demonstrated over a spectrum of performance criteria.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122316501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using time skewing to eliminate idle time due to memory bandwidth and network limitations","authors":"D. Wonnacott","doi":"10.1109/IPDPS.2000.845979","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845979","url":null,"abstract":"Time skewing is a compile-time optimization that can provide arbitrarily high cache hit rates for a class of iterative calculations, given a sufficient number of time steps and sufficient cache memory. Thus, it can eliminate processor idle time caused by inadequate main memory bandwidth. In this article, we give a generalization of time skewing for multiprocessor architectures, and discuss time skewing for multilevel caches. Our generalization for multiprocessors lets us eliminate processor idle time caused by any combination of inadequate main memory bandwidth, limited network bandwidth, and high network latency, given a sufficiently large problem and sufficient cache. As in the uniprocessor case, the cache requirement grows with the machine balance rather than the problem size. Our techniques for using multilevel caches reduce the LI cache requirement, which would otherwise be unacceptably high for some architectures when using arrays of high dimension.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127407046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-stabilizing mutual exclusion using unfair distributed scheduler","authors":"A. Datta, M. Potop-Butucaru, S. Tixeuil","doi":"10.1109/IPDPS.2000.846023","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846023","url":null,"abstract":"A self-stabilizing algorithm, regardless of the initial system state, converges infinite time to a set of states that satisfy a legitimacy predicate without the need for explicit exception handler of backward recovery. Mutual exclusion is fundamental in the area of distributed computing, by serializing the accesses to a common shared resource. All existing probabilistic self-stabilizing mutual exclusion algorithms designed to work under an unfair distributed scheduler suffer from the following common drawback: Once stabilized, there exists no upper bound of time between two executions of the critical section at a given node. We present the first probabilistic self-stabilizing algorithm that guarantees such a bound (O(n/sup 3/), where n is the network size) while working using an unfair distributed scheduler. As the scheduling adversary gets weaker the bound gets better. Our algorithm works in an anonymous unidirectional ring of any size and has a O(n/sup 3/) expected stabilization time.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115673823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Skiplist-based concurrent priority queues","authors":"N. Shavit, Itay Lotan","doi":"10.1109/IPDPS.2000.845994","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845994","url":null,"abstract":"This paper addresses the problem of designing scalable concurrent priority queues for large scale multiprocessors machines with up to several hundred processors. Priority queues are fundamental in the design of modern multiprocessor algorithms, with many classical applications ranging from numerical algorithms through discrete event simulation and expert systems. While highly scalable approaches have been introduced for the special case of queues with a fixed set of priorities, the most efficient designs for the general case are based on the parallelization of the heap data structure. Though numerous intricate heap-based schemes have been suggested in the literature, their scalability seems to be limited to small machines in the range of ten to twenty processors. This paper proposes an alternative approach: to base the design of concurrent priority queues on the probabilistic skiplist data structure, rather than on a heap. To this end, we show that a concurrent skiplist structure, following a simple set of modifications, provides a concurrent priority queue with a higher level of parallelism and significantly less contention than the fastest known heap-based algorithms. Our initial empirical evidence, collected on a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife, suggests that the new skiplist based priority queue algorithm scales significantly better than heap based schemes throughout most of the concurrency range. With 256 processors, they are about twice as fast in performing deletions and up to 8 times faster in performing insertions.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129885283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reduction optimization in heterogeneous cluster environments","authors":"Pangfeng Liu, Da-Wei Wang","doi":"10.1109/IPDPS.2000.846024","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846024","url":null,"abstract":"Network of workstation (NOW) is a cost-effective alternative to massively parallel supercomputers. As commercially available off-the-shelf processors become cheaper and faster, it is now possible to build a cluster that provides high computing power within a limited budget. However, a cluster may consist of different types of processors and this heterogeneity complicates the design of efficient collective communication protocols. For example, it is a very hard combinatorial problem to find an optimal reduction schedule for such heterogeneous clusters. Nevertheless, we show that a simple technique called slowest-node-first (SNF) is very effective in designing efficient reduction protocols for heterogeneous clusters. First, we show that SNF is actually an approximation algorithm with competitive ratio two. In addition, we show that SNF does give the optimal reduction time when the cluster consists of two types of processors, anal the ratio of communication speed between them is at least two.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128909890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sorting on the OTIS-mesh","authors":"A. Osterloh","doi":"10.1109/IPDPS.2000.845995","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845995","url":null,"abstract":"In this paper we present sorting algorithms on the recently introduced N/sup 2/ processor OTIS-mesh, a network with diameter 4/spl radic/N-3 consisting of N connected meshes of size /spl radic/N/spl times//spl radic/N. We show that k-k sorting can be done in 8/spl radic/N+O(N/sup 1/3/) steps for k=1, 2, 3, 4 and in 2k/spl radic/N+O(kN/sup 1/3/) steps for k>4 with constant buffer-size for all k. We show how our algorithms can be modified to achieve 4/spl radic/N+O(N/sup 1/3/) steps for k=1, 2, 3, 4 and k/spl radic/N+O(kN/sup 1/3/) steps for k>4 in the average case. Finally, we show a lower bound of max{4/spl radic/N, 1//spl radic/2 k/spl radic/N} steps for k-k sorting.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Flocchini, E. Kranakis, N. Santoro, D. Krizanc, F. Luccio
{"title":"Sorting multisets in anonymous rings","authors":"P. Flocchini, E. Kranakis, N. Santoro, D. Krizanc, F. Luccio","doi":"10.1109/IPDPS.2000.845996","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845996","url":null,"abstract":"An anonymous ring network is a ring where all processors (vertices) are totally indistinguishable except for their input value. Initially, to each vertex of the ring is associated a value from a totally ordered set; thus, forming a multiset. In this paper we consider the problem of sorting such a distributed multiset and we investigate its relationship with the election problem. We focus on the computability and the complexity of these problems, as well as on their interrelationship, providing strong characterizations, showing lower bounds, and establishing efficient upper bounds.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122955655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the scheduling algorithm of the dynamically trace scheduled VLIW architecture","authors":"A. D. Souza, P. Rounce","doi":"10.1109/IPDPS.2000.846036","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846036","url":null,"abstract":"In a machine that follows the dynamically trace scheduled VLIW (DTSVLIW) architecture, VLIW instructions are built dynamically through an algorithm that can be implemented in hardware. These VLIW instructions are cached so that the machine can spend most of its time executing VLIW instructions without sacrificing any binary compatibility. This paper evaluates the effectiveness of the DTSVLIW instruction-scheduling algorithm by comparing it with the first come first served (FCFS) algorithm, used for microinstruction compaction, and the greedy algorithm, used by the Dynamic Instruction Formatting (DIF) architecture. We also present comparisons between the DTSVLIW, pure VLIW, and the Power PC620 processor. Our results show that the DTSVLIW scheduling algorithm has almost the same performance as the Greedy and FCFS. The results also show that the DTSVLIW performs better than the DIF for important machine configurations, better than pure VLIW implementations in most cases, and better than the Power PC620 using equivalent hardware resources.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124350773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. S. Love, S. Yalamanchili, J. Duato, María Blanca Caminero, F. Quiles
{"title":"Switch scheduling in the multimedia router (MMR)","authors":"D. S. Love, S. Yalamanchili, J. Duato, María Blanca Caminero, F. Quiles","doi":"10.1109/IPDPS.2000.845958","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.845958","url":null,"abstract":"The primary goal of the Multimedia Router (MMR) project is the design and implementation of a router optimized for multimedia applications. The router is targeted for use in cluster and LAN interconnection networks which offer different constraints and therefore differing router solutions than WANs. This paper describes and evaluates a switch scheduling algorithm based on a priority biasing scheme for dynamically updating the priorities of the connections established through the router. Unlike existing schemes that simply use the age of a flit as its priority, the novel feature of the proposed approach is that the priority is biased using the measured quality of service (QoS) values for the connection. Furthermore, the structure of the switch scheduling algorithm is motivated by opportunities for pipelined and concurrent operation so that scheduling decisions could be made at switching speeds. The performance of two of the many possible biasing functions is evaluated.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"606 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116074559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel implementation of a fast multipole based 3-D capacitance extraction program on distributed memory multicomputers","authors":"Yanhong Yuan, P. Banerjee","doi":"10.1109/IPDPS.2000.846002","DOIUrl":"https://doi.org/10.1109/IPDPS.2000.846002","url":null,"abstract":"Very fast and accurate 3-D capacitance extraction is essential for interconnect optimization in ultra deep sub-micro designs (UDSM). Parallel processing provides an approach to reducing the simulation turn-around time. This paper examines the parallelization of the well known fast multipole based 3-D capacitance extraction program FASTCAP, which employs new preconditioning and adaptive techniques. To account for the complicated data dependencies in the unstructured problems, we propose a generalized cost function model, which can be used to accurately measure the workload associated with each cube in the hierarchy. We then present two adaptive partitioning schemes, combined with efficient communication mechanisms with bounded buffer size, to reduce the parallel processing overhead. The overall load balance is achieved through balancing the load at each level of the multipole computation. We report detailed performance results using a variety of standard benchmarks on 3-D capacitance extraction, on an IBM SP2.","PeriodicalId":206541,"journal":{"name":"Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2000-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126429673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}