{"title":"Mixed mode matrix multiplication","authors":"Meng-Shiou Wu, S. Aluru, R. Kendall","doi":"10.1109/CLUSTR.2002.1137747","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137747","url":null,"abstract":"In modern clustering environments where the memory hierarchy has many layers (distributed memory, shared memory layer, cache, ...), an important question is how to fully utilize all available resources and identify the most dominant layer in certain computation. When combining algorithms on all layers together, what would be the best method to get the best performance out of all the resources we have? The mixed mode programming model that uses thread programming on the shared memory layer and message passing programming on the distributed memory layer is a method that many researchers are using to utilize the memory resources. We take an algorithmic approach that uses matrix multiplication as a tool to show how cache algorithms affect the performance of both shared memory and distributed memory algorithms. We show that with good underlying cache algorithm, overall performance is stable. When the underlying cache algorithm is bad, superlinear speedup may occur and increasing number of threads may also improve performance.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"102 1","pages":"195-203"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78094937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Alexeev, Michael W. Schmidt, T. Windus, M. Gordon, R. Kendall
{"title":"Performance and implementation of distributed data CPHF and SCF algorithms","authors":"Y. Alexeev, Michael W. Schmidt, T. Windus, M. Gordon, R. Kendall","doi":"10.1109/CLUSTR.2002.1137738","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137738","url":null,"abstract":"This paper describes a novel distributed data parallel self consistent field (SCF) algorithm and the distributed data coupled perturbed Hartree-Fock (CPHF) step of an analytic Hessian algorithm. The distinguishing features of these algorithms are: (a) columns of density and Fock matrices are distributed among processors, (b) pairwise dynamic load balancing and an efficient static load balancer were developed to achieve a good workload, and (c) network communication time is minimized via careful analysis of data flow in the SCF and CPHF algorithms. By using a shared memory model, novel work load balancers, and improved analytic Hessian steps, we have developed codes that achieve superb performance. The performance of the CPHF code is demonstrated on a large biological system.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"2 1","pages":"135-142"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81241578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallelization technique that improves performance and cluster utilization efficiency for heterogeneous clusters of workstations","authors":"Gerardo Díaz-Cuéllar, David A. Garza-Salazar","doi":"10.1109/CLUSTR.2002.1137756","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137756","url":null,"abstract":"We present a new parallelization technique that significantly improves performance of certain data-parallel algorithms on heterogeneous clusters of workstations. The two main goals of our technique are to improve execution times (compared to traditional parallelization techniques) and to efficiently use the computing resources available in the cluster. The technique is based on a pre-processing phase where information about the cluster is obtained, a load balanced data decomposition is derived, and information is generated to guide the cluster node utilization during the execution of the parallel algorithm. We applied our technique to Gaussian Elimination and Pairwise Interaction problems, the experiments show speedup improvements up to 133% and 275% respectively and the cluster utilization efficiency improves tip to 180% and 300% when compared to traditional parallelization techniques.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"366 1","pages":"275-283"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80364642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An agent-based infrastructure for parallel Java on heterogeneous clusters","authors":"J. Al-Jaroodi, N. Mohamed, Hong Jiang, D. Swanson","doi":"10.1109/CLUSTR.2002.1137724","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137724","url":null,"abstract":"In this paper, we introduce an agent-based infrastructure that provides software services and functions for developing and deploying high performance programming models and applications on clusters. A Java-based prototype, based on this architecture, has been developed. Since this system is written completely in Java, it is portable and allows executing programs in parallel across multiple heterogeneous platforms. With the agent-based infrastructure, users need not deal with the mechanisms of deploying and loading user classes on the heterogeneous cluster. Moreover, details of scheduling, controlling, monitoring, and executing user jobs are hidden. In addition, the management of system resources is made transparent to the user. Such uniform services, when rendered available in a ubiquitous manner, are essential for facilitating the development and deployment of scalable high performance Java applications on clusters. An initial deployment over a heterogeneous, distributed cluster results in significantly enhanced performance; absolute performance compared to C (MPI) improves with increased granularity of the algorithms.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"141 1","pages":"19-27"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80418224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supermon: a high-speed cluster monitoring system","authors":"M. Sottile, R. Minnich","doi":"10.1109/CLUSTR.2002.1137727","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137727","url":null,"abstract":"Supermon is a flexible set of tools for high speed, scalable cluster monitoring. Node behavior can be monitored much faster than with other commonly used methods (e.g., rstatd). In addition, Supermon uses a data protocol based on symbolic expressions (S-expressions) at all levels of Supermon, from individual nodes to entire clusters. This contributes to Supermon's scalability and allows it to function in a heterogeneous environment. This paper presents the Supermon architecture and discuss initial performance measurements on a cluster of heterogeneous Alpha-processor based nodes.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"112 1","pages":"39-46"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87660765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JESSICA2: a distributed Java Virtual Machine with transparent thread migration support","authors":"Wenzhang Zhu, Cho-Li Wang, F. Lau","doi":"10.1109/CLUSTR.2002.1137770","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137770","url":null,"abstract":"A distributed Java Virtual Machine (DJVM) spanning multiple cluster nodes can provide a true parallel execution environment for multi-threaded Java applications. Most existing DJVMs suffer from the slow Java execution in interpretive mode and thus may not be efficient enough for solving computation-intensive problems. We present JESSICA2, a new DJVM running in JIT compilation mode that can execute multi-threaded Java applications transparently on clusters. JESSICA2 provides a single system image (SSI) illusion to Java applications via an embedded global object space (GOS) layer. It implements a cluster-aware Java execution engine that supports transparent Java thread migration for achieving dynamic load balancing. We discuss the issues of supporting transparent Java thread migration in a JIT compilation environment and propose several lightweight solutions. An adaptive migrating-home protocol used in the implementation of the GOS is introduced. The system has been implemented on x86-based Linux clusters and significant performance improvements over the previous JESSICA system have been observed.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"54 1","pages":"381-388"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85440169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kernel-level caching for optimizing I/O by exploiting inter-application data sharing","authors":"M. Vilayannur, M. Kandemir, A. Sivasubramaniam","doi":"10.1109/CLUSTR.2002.1137775","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137775","url":null,"abstract":"With applications becoming larger and the increasing load on high performance systems, it is important to tackle the I/O bottleneck problem from several angles. It is not only essential to optimize the I/O accesses of any one application, but also to be able to identify and exploit opportunities resulting front the sharing of datasets across applications. Clusters are rapidly becoming the platform of choice for demanding applications due to their cost-effectiveness and widespread deployment. Consequently, this paper attempts to optimize data sharing across applications concurrently executing on the cluster. Specifically, we propose and implement a kernel-level caching module at each node of a Linux cluster that can be used to service several processes of different applications. Using detailed evaluations on an actual Linux cluster this paper demonstrates the benefits of this module in optimizing intra and inter-application I/O requests.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"16 1","pages":"425-432"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78755707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joon-Hyung Hwangbo, Sang-Ki Lee, Yoon-Young Lee, Dae-Wha Seo
{"title":"Adaptive message management using hybrid channel model in parallel file system","authors":"Joon-Hyung Hwangbo, Sang-Ki Lee, Yoon-Young Lee, Dae-Wha Seo","doi":"10.1109/CLUSTR.2002.1137752","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137752","url":null,"abstract":"A parallel file system is utilized for supporting an excessive file request resulted from a parallel application in a cluster system. It uses traditional communication protocols like TCP/IP or UDP/IP that were designed for Wide Area Networks(WANs). For a cluster system, however, these protocols are inappropriate for its large scale of network overhead. In accordance with this problem, we propose a Hybrid Channel Model(HCM) for inter-cluster communication protocol. In a parallel file system, messages can be classified as control messages and file data block. Therefore, we divided a message channel into two parts, a control message channel and data channel. The first is used for transferring the control messages, while the last is used for transferring the file data blocks. For the message channel, TCP/IP is used as a communication protocol, and Virtual Interface Architecture(VIA) is used for data blocks. In tests, the proposed channel model exhibited a considerably improved performance.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"20 1","pages":"239-244"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74018590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using idle disks in a cluster as a high-performance storage system","authors":"J. Hansen, Renaud Lachaiz","doi":"10.1109/CLUSTR.2002.1137774","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137774","url":null,"abstract":"In many clusters today, the local disks of a node are only used sporadically. This paper describes the software support for sharing of disks in clusters, where the disks are distributed across the nodes in the cluster, thereby allowing them to be combined into a high-performance storage system. Compared to centralized storage servers, such an architecture allows the total I/O capacity of the cluster to scale up with the number of nodes and disks. Additionally, our software allows customizing the functionality of the remote disk access using a library of code modules. A prototype has been implemented on a cluster connected by a Scalable Coherent Interface (SCI) network and performance measurements using both raw device access and a distributed file system show that the performance is comparable to dedicated storage systems and that the overhead of the framework is moderate even during high load. Thus, the prospects are that clusters sharing disks distributed among the nodes will allow both the application processing power and total I/O capacity of the cluster to scale up with the number of nodes.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"1 1","pages":"415-424"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75713216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cluster based hybrid hash join: analysis and evaluation","authors":"E. Schikuta, Peter Kirkovits","doi":"10.1109/CLUSTR.2002.1137783","DOIUrl":"https://doi.org/10.1109/CLUSTR.2002.1137783","url":null,"abstract":"The join is the most important, but also the most time consuming operation in relational database systems. We implemented the parallel hybrid hash join algorithm on a PC-cluster architecture and analyzed its performance behavior. We show that off-the-shelf cost saving cluster systems can build a viable platform for parallel database systems.","PeriodicalId":92128,"journal":{"name":"Proceedings. IEEE International Conference on Cluster Computing","volume":"49 1","pages":"461-466"},"PeriodicalIF":0.0,"publicationDate":"2002-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77748692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}