{"title":"Key Message Algorithm: a communication optimization algorithm in cluster-based parallel computing","authors":"M. Zhu, Wentong Cai, Bu-Sung Lee","doi":"10.1109/IWCC.1999.810816","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810816","url":null,"abstract":"Parallel computing using Network of Workstations (NOWs) has become very popular recently. However, the execution of parallel applications on such systems has been hampered by the high communication overhead. To reduce the communication overhead and to speedup the execution of parallel applications on NOWs, the paper proposes a Key Message approach that minimizes the cost of message passing in a parallel application by prioritizing communications in the underlying shared communication network. We first describe the queueing network model on which our approach is based, then introduce the algorithm that identifies the messages to be prioritized in a parallel application, and finally discuss the results obtained. Our preliminary analysis of the algorithm on randomly generated task graphs shows improvement over the system without using the prioritization scheme.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115595248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nomad: a scalable operating system for clusters of uni- and multiprocessors","authors":"Eduardo Pinheiro, R. Bianchini","doi":"10.1109/IWCC.1999.810831","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810831","url":null,"abstract":"The recent improvements in workstation and interconnection network performance have popularized the clusters of off-the-shelf workstations. However, the usefulness of these clusters is yet to be fully exploited, mostly due to the inadequate management of cluster resources implemented by current distributed operating systems. In order to eliminate this problem and approach the computational power of large clusters of workstations, in this paper we propose Nomad, an efficient operating system for clusters of uni and/or multiprocessors. Nomad includes several important characteristics for modern cluster-oriented operating systems: scalability, efficient resource management across the cluster, efficient scheduling of parallel and distributed applications, distributed I/O, fault detection and recovery, protection, and backward compatibility. Some of the mechanisms used by Nomad, such as process checkpointing and migration, can be found in previously proposed systems. However, our system stands out for its strategy for disseminating information across the cluster and its efficient management of all cluster resources. In addition, Nomad is highly scalable as it uses neither centralized control nor extra messages to implement its functionality, taking advantage of the I/O traffic associated with its distributed file system. Our preliminary evaluation of the load balancing aspect of Nomad shows that the pattern of file accesses in our distributed Ale system allows for efficient and scalable load balancing. Our main conclusion is that the complete implementation of Nomad will most likely be efficient and will be a nice platform for future research on operating systems for clusters of workstations.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129809909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Algorithms for stable sorting to minimize communications in networks of workstations and their implementations in BSP","authors":"C. Cérin, J. Gaudiot","doi":"10.1109/IWCC.1999.810815","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810815","url":null,"abstract":"We introduce a novel approach to produce BSP (Bulk Synchronous Programming model) programs and we show their efficiency by implementing the stable sorting problem on clusters of PC. Experimental results on PCs based on Ethernet and Myrinet cards are compared with implementations on an SGI 2000. The algorithms presented in the paper are either developed under the theoretical framework of the Regular Sampling technique which guarantees good load balancing properties or are inspired by the technique in order to decrease the sequential work of each processor comparing to the Regular Sampling technique but impose no (theoretical) bound on load balancing. The main sequential block of code used in the algorithms for local sorting are derivatives of Shellsort (which is stable) and a new code based on Quicksort (which is not stable) plus a property on real numbers that is used for stable sorting under the framework of BSR (Broadcast with Selective Reduction).","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128763503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparing the communication performance and scalability of a Linux and a NT cluster of PCs, a Cray origin 2000, an IBM SP and a Cray T3E-600","authors":"G. Luecke, B. Raffin, James Coyle","doi":"10.1109/IWCC.1999.810806","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810806","url":null,"abstract":"The paper presents scalability and communication performance results for a cluster of PCs running Linux with the GM communication library, a cluster of PCs running Windows NT with the HPVM communication library, a Cray T3E-600, an IBM SP and a Cray Origin 2000. Both PC clusters were using a Myrinet network. Six communication tests using MPI routines were run for a variety of message sizes and numbers of processors. The tests were chosen to represent commonly used communication patterns with low contention (a ping-pong between processors, a right shift, a binary tree broadcast and a synchronization barrier) to communication patterns with high contention (a naive broadcast and an all-to-all). For most of the tests, the T3E provides the best performance and scalability. For an 8 byte message the NT cluster performs about the same as the T3E for most of the tests. For all the tests but one, the T3E, the Origin and the SP outperform the two clusters for the largest message size (10 Kbytes or 1 Mbyte).","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unobtrusive workstation farming without inconveniencing owners: learning Backgammon with a genetic algorithm","authors":"P. Darwen","doi":"10.1109/IWCC.1999.810900","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810900","url":null,"abstract":"Most efforts at low-cost parallel computing assume a monopoly on the hardware being used. That all-or-nothing attitude ignores many machines dedicated to other activities, but which sit idle for 16 hours a day. However naive attempts to utilize idle machines can interfere with their primary purpose. This paper describes the successful effort to unobtrusively farm idle machines, for an artificial intelligence system using a genetic algorithm to learn the game Backgammon. It maintains owners' full access to their machines, without causing any detectable interference.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121606403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formal specification of virtual process topologies","authors":"K. Kazemi, C. McDonald","doi":"10.1109/IWCC.1999.810822","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810822","url":null,"abstract":"A lack of adequate and flexible topology support in the popular message passing systems such as Parallel Virtual Machine was a major factor in the development of our Virtual Process Topology Environment. This parallel programming environment provides high level abstractions for interprocess communication, relieving the application developer of the cumbersome task of mapping logical neighbours to their task identifiers within message passing systems. The novel approach of separating topological specification from the APIs provided extreme flexibility to the developers of the applications using regular topologies. We believed that the task of supporting process topologies could be made even easier and, in this paper present our new method which uses recurrence relations to define topologies. Within the new environment, the recurrence relationships can be passed to the topology server which then is used in the generation of the topological specification.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131503827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Investigation to make best use of LSF with high efficiency","authors":"F. Costen, J. Brooke, M. Pettipher","doi":"10.1109/IWCC.1999.810827","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810827","url":null,"abstract":"A network of commodity workstations offers an effective platform for large scale parallel processing. The effectiveness of such systems can be greatly improved by efficient cluster management software. In this paper, we focus upon the widely used load sharing facility (LSF), and investigate its ability to run both parallel and serial simulations and increase the throughput of jobs. The experiment shows that LSF does a good job in terms of balancing the load for serial jobs and avoiding the machines with high processor utilization rates. Using the result of the experiment, we created a useful diagnostic tool for assessing the impact of load balancing software.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127371944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David A. Bader, A. Maccabe, Jason R. Mastaler, J. McIver, P. Kovatch
{"title":"Design and analysis of the Alliance/University of New Mexico Roadrunner Linux SMP SuperCluster","authors":"David A. Bader, A. Maccabe, Jason R. Mastaler, J. McIver, P. Kovatch","doi":"10.1109/IWCC.1999.810804","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810804","url":null,"abstract":"This paper discusses high performance clustering from a series of critical topics: architectural design, system software infrastructure, and programming environment. This is accomplished through an overview of a large scale, high performance SuperCluster (Roadrunner). This SuperCluster is based almost entirely on freely available, vendor-independent software: for example, its operating system (Linux), job scheduler (PBS), compilers (GNU/EGCS), and parallel programming libraries (MPI). The Globus toolkit, also available for this platform allows high performance distributed computing applications to use geographical distributed resources such as this SuperCluster. In addition to describing the design and analysis of the Roadrunner SuperCluster we provide experimental analyses from grand challenge applications and future directions for SuperClusters.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125918754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative performance of a commodity Alpha cluster running Linux and Windows NT","authors":"D. Lancaster, Kenji Takeda","doi":"10.1109/IWCC.1999.810805","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810805","url":null,"abstract":"Using a cluster of commodity Alpha processors, we compare two software platforms based on Linux and Windows NT and intended to support intensive scientific computations. Networking and compiler performance are separately analysed and then results for NAS parallel benchmarks are given. We find that a compiler able to make good use of the cache is more important than low network latency in obtaining high performance. We argue that for all types of cluster, the choice of compiler is critical in selecting a cost effective platform for computationally intensive scientific application.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127824295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhihui Du, Sanli Li, Shuyou Li, Mengyue Wu, Jing Zhu
{"title":"Massively parallel simulated annealing embedded with downhill-a SPMD algorithm for cluster computing","authors":"Zhihui Du, Sanli Li, Shuyou Li, Mengyue Wu, Jing Zhu","doi":"10.1109/IWCC.1999.810899","DOIUrl":"https://doi.org/10.1109/IWCC.1999.810899","url":null,"abstract":"Simulated Annealing (SA) is a frequently used stochastic algorithm to deal with combinatorial optimization problems and it converges with probability infinitely close to 1. SA is an NP algorithm and the long executive time prevents it from being accepted for many real-time applications. This paper presents a SPMD (Single Program Multiple Data) algorithm which combines SA with local searching algorithm-downhill. The hybrid method not only keeps the convergence of SA but also improves the convergence speed of SA. Approximate solutions can be found quickly for complex optimization problems and more precise solutions can also be found by employing the same algorithm to fine-tune the approximate solutions. SA is an essential serial algorithm, but the SPMD algorithm breaks up the serial bottleneck of SA and its performance scales up linearly with the increase of processors, at the same time, the SPMD algorithm does not require careful choice of control parameters. Application cases show that the algorithm is robust and it can find high quality solution with high speed.","PeriodicalId":276367,"journal":{"name":"ICWC 99. IEEE Computer Society International Workshop on Cluster Computing","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121721319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}