{"title":"Towards asynchronous metacomputing in MPI","authors":"A. Sodan","doi":"10.1109/HPCSA.2002.1019158","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019158","url":null,"abstract":"Metacomputing so far has been done mostly in more or less static configurations. However, applications with dynamic irregular behavior are increasing in significance and the computing platforms more often are time-sharing environments with varying system load. Thus, possibilities for dynamic connection and dynamic workload migration are becoming important. The paper discusses an approach to perform asynchronous workload balancing using the standard parallel library MPI. MPI and threads typically live in more or less separated worlds and the thread extension of MPI-2 is mainly meant to exploit more efficiently per SMP node within a model which is still mostly SPMD. We have extended MPI by dynamic mechanisms to automatically balance workload on the basis of threads and dynamic status/resource monitoring. Our extended library TeMPI is designed to run with in a minimum version with MPICH and thus MPICH-G2 in the Globus grid environment.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115291551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational chemistry applications: performance on high-end and commodity-class computers","authors":"M. Guest, P. Sherwood","doi":"10.1109/HPCSA.2002.1019173","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019173","url":null,"abstract":"Beowulf clusters, on face value, offer the potential of a viable cost effective alternative for the provision of High Performance Computing. In this paper we compare the performance of Beowulf clusters built from commodity \"off the shelf\" components in the support of major research and production codes, with current high-end hardware such as the IBM SP, Compaq AlphaServer SC and SGI Origin 3800. The results concentrate on the application area of computational chemistry. Benchmark data on six commodity-based systems (CS1-CS6) featuring Intel, AMD Athlon and Alpha CPU architectures coupled to traditional Beowulf interconnect, such as Myrinet and Ethernet, are presented. Furthermore, we provide performance data on systems utilising the Quadrics QSNet interconnect technology, and initial results from a prototype of the Cray Supercluster.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"AES-3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126496313","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Numerical applications and sub-word parallelism: the NAS benchmarks on a Pentium 4","authors":"E. Daniel","doi":"10.1109/HPCSA.2002.1019156","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019156","url":null,"abstract":"We examine the impact of Pentium 4 SIMD instructions on the Fortran and C versions of the NAS benchmarks, either by compiler vectorization or by assembly code in-lining. If few functions generally profit from the SIMD operations, the ones using complex numbers or random number generators can be efficiently accelerated.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125469192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Natural language processing complexity and parallelism","authors":"C. Moghrabi, S. Moussa, M. Eid","doi":"10.1109/HPCSA.2002.1019168","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019168","url":null,"abstract":"This paper reviews the processes involved in Natural Language Processing (NLP). It then demonstrates the various kinds of choices that need be taken during the execution of the word morphology, the syntactic text analysis, or text generation components. It compares the time complexity of traditional serial algorithms and examines the possible expected gain in some corresponding parallel counterparts.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123841152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bandwidth efficient tamper detection for distributed Java systems","authors":"M. Jochen, L. Marvel, L. Pollock","doi":"10.1109/HPCSA.2002.1019165","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019165","url":null,"abstract":"The benefits of distributed computation present complex security considerations beyond those associated with the traditional computing paradigm. This paper describes a bandwidth efficient approach to authenticate distributed Java code. Our system utilizes steganographic techniques to embed a cryptographic checksum as a tamper detection mark into Java class files. The properties of this mark make our system desirable in applications where low bandwidth utilization is a requirement (e.g., wireless networks and low power devices). We implemented our system in Java and evaluated its performance through an empirical study. The analysis indicates that our system detects any degree of alteration to a marked Java class file and can do so within a reasonable amount of time.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116778642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Special workshop on : high performance computer simulation of cooperative phenomena","authors":"K. De'Bell, J. Whitehead","doi":"10.1109/HPCSA.2002.1019175","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019175","url":null,"abstract":"In a wide variety of physical systems cooperative phenomena resulting from interactions at the atomic or molecular levels give rise to structures on mesoscopic to macroscopic length scales. The problem of calculating the properties of such systems from simulations based on mathematical models is computationally intense because of the range of length scales and length of time that must be included. This session discusses algorithms and ”best practices” for the use of high performance computing to simulate these systems and will include several examples of applications to specific systems.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129578123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-adaptive arallel rocessing architecture for high-speed networking","authors":"J. Foag, Nuria Pazos, T. Wild, W. Brunnbauer","doi":"10.1109/HPCSA.2002.1019485","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019485","url":null,"abstract":"This article describes a packet processing methodology, which executes the required protocol layer functions of a networking device in parallel. Based on the dynamic prediction of the inherent protocol-stack of receiving packets, the data dependency of the layers is speculatively resolved and the functions are processed. Consequently, packet processing latency can be minimized and end-to-end transmission delays can be optimized without sacrificing flexibility in the supported networking protocols and applications. Utilizing the self-adaptive processing methodology, presented in this paper, transmission systems may realize a mean system processing time reduction and consequently a transmission rate increase of up to 40 percent dependent on the quality of the prediction.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123582906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. M. Bücker, Bruno Lang, A. Rasch, Christian H. Bischof, Dieter an Mey
{"title":"Explicit loop scheduling in OpenMP for parallel automatic differentiation","authors":"H. M. Bücker, Bruno Lang, A. Rasch, Christian H. Bischof, Dieter an Mey","doi":"10.1109/HPCSA.2002.1019144","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019144","url":null,"abstract":"Derivatives of almost arbitrary functions can be evaluated efficiently by automatic differentiation whenever the functions are given in the form of computer programs in a high-level programming language such as Fortran, C, or C++. In contrast to numerical differentiation, where derivatives are only approximated, automatic differentiation generates derivatives that are accurate up to machine precision. Sophisticated software tools implementing the technology of automatic differentiation are capable of automatically generating code for the product of the Jacobian matrix and a so-called seed matrix. It is shown how these tools can benefit from concepts of shared memory programming to parallelize, in a completely mechanical fashion, the gradient operations associated with each statement of the given code. The feasibility of our approach is demonstrated by numerical experiments. They were performed with a code that was generated automatically by the Adifor system and augmented with OpenMP directives.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117340310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeny in a Beowulf","authors":"A. Macks","doi":"10.1109/HPCSA.2002.1019133","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019133","url":null,"abstract":"A Beowulf cluster was built with a large number of different operating systems and processor architectures, in a variety of speed classes. The successful completion of the system confirms the viability of such heterogeneous Beowulf clusters.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125690610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the design of scalable pipelined broadcasting for mesh networks","authors":"A. Al-Dubai, M. Ould-Khaoua","doi":"10.1109/HPCSA.2002.1019140","DOIUrl":"https://doi.org/10.1109/HPCSA.2002.1019140","url":null,"abstract":"Minimising the communication latency and achieving considerable scalability are of paramount importance when designing high performance broadcast algorithms. Many algorithms for wormhole-switched meshes have been widely reported in the literature. However, most of these algorithms handle broadcast in a sequential manner and do not scale well with the network size. As a consequence, many parallel applications cannot be efficiently supported using existing algorithms. Motivated by these observations, this paper presents a new broadcast algorithm for the all-port mesh networks. The unique feature of the proposed algorithm is its capability of handling broadcast in only one message-passing step irrespective of the network size. Results from a comparative analysis and simulation reveal that the proposed algorithm exhibits superior performance characteristics over those of the well-known Recursive Doubling, Extending Dominating Node and Network Partitioning algorithms.","PeriodicalId":111862,"journal":{"name":"Proceedings 16th Annual International Symposium on High Performance Computing Systems and Applications","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134590500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}