{"title":"The integrated library approach to parallel computing","authors":"A. Knies, G. Adams","doi":"10.1109/SPLC.1993.365558","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365558","url":null,"abstract":"This paper describes preliminary work on the development of a language implementation technique that integrates the compilation of user level code with a special hierarchical library. To demonstrate this system, we have designed a Fortran77-based high level language (called MF) that does not permit the programmer to manage work or data distribution. For the system to be effective, the programmer must make calls to the library. At each call site, the compiler is provided with information about potential distributions for each of the variables involved. Our approach is intended to be used to gather empirical data about the portability and efficiency of specific programs written in MF and how their distribution requirements vary from machine to machine.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114338877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using DASPK on the TMC CM5. Experiences with two programming models","authors":"Robert S. Maier, L. Petzold, W. Rath","doi":"10.1109/SPLC.1993.365569","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365569","url":null,"abstract":"DASPK solves large-scale systems of differential-algebraic equations. We have developed two parallel versions of DASPK. They are DASPKF90, a Fortran 90 data parallel implementation for the cmf compiler, and DASPKMP, a message-passing implementation written in Fortran 77 with extended BLAS. We describe the two codes, demonstrate the application of the codes on true large-scale examples, gave some timing results, and draw some conclusions based on our experiences.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121709087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed solution of sparse symmetric positive definite systems","authors":"M. Heath, P. Raghavan","doi":"10.1109/SPLC.1993.365576","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365576","url":null,"abstract":"We consider the solution of a linear system Ax=b on a distributed memory machine when the matrix A is large, sparse and symmetric positive definite. In a previous paper we developed an algorithm to compute a fill-reducing nested dissection ordering of A on a distributed memory machine. We now develop algorithms for the remaining steps of the solution process. The large-grain task parallelism resulting from sparsity is identified by a tree of separators available from nested dissection. Our parallel algorithms use this separator tree to estimate the structure of the Cholesky factor L and to organize numeric computations as a sequence of dense matrix operations. We present results of an implementation on an Intel iPSC/860 parallel computer.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133177959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Defining and measuring scalability","authors":"Edward A. Luke","doi":"10.1109/SPLC.1993.365568","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365568","url":null,"abstract":"The concept of scalability in parallel systems is a simple one: given a reasonable performance on a sample problem, a problem of increased workload can be solved with reasonable performance given a commensurate increase in computational resources. This definition does not afford the analytical precision that is required of any scientific classification system, since the terms of this definition are almost entirely subjective. Some attempts have been made to measure scalability, but many of the popular measurements do not eliminate subjective terms. For example: the fixed-time measurements that have been advanced do not specify a fixed-time constraint, and the scaled-speedup measurements do not specify initial workload. The problem with these measurements is that they depend on a subjective definition of \"reasonable performance\" and as a result are unreliable. An alternate definition of scalability can be found when scalability is defined as the ability to maintain cost effectiveness as workload grows. When this approach is considered, the subjective \"reasonable performance\" becomes replaced by an objective term of optimal cost effectiveness. Obviously the success of this approach depends highly on determining a cost effectiveness function that is relevant to scalability. This paper will introduce a cost effectiveness function and argue that the proposed cost effectiveness function is highly relevant to the goals of developing scalable systems.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"259 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123096628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing applications performance on Intel Paragon through dynamic memory allocation","authors":"S. Saini, H. Simon","doi":"10.1109/SPLC.1993.365561","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365561","url":null,"abstract":"The Paragon operating system (OS) supports virtual memory (VM). The OS manages virtual memory by performing two services. Firstly, paging-in service pages the execution code from the service node to the compute nodes. This includes the paging-in of empty data corresponding to statically allocated arrays. Secondly, paging-out service is performed by paging the unused part of the OSF server to the boot node to make space available for the user's execution code. These paging-in and paging-out activities take place simultaneously and drastically degrade the performance of the user code. We have investigated this problem in detail, and found that the dynamic allocation of memory completely eliminates the unnecessary and undesirable effects of paging-in empty data arrays from the service node to the compute nodes and thereby increases the performance of the applications considered in the present work by 30% to 40%.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116078821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Run-time recognition of task parallelism within the P++ parallel array class library","authors":"R. Parsons, D. Quinlan","doi":"10.1109/SPLC.1993.365580","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365580","url":null,"abstract":"This paper explores the use of a run-time system to recognize task parallelism within a C++ array class library. Run-time systems currently support data parallelism in P++, FORTRAN 90 D, and High Performance FORTRAN. But data parallelism is insufficient for many applications, including adaptive mesh refinement. Without access to both data and task parallelism such applications exhibit several orders of magnitude more message passing and poor performance. In this paper, a C++ array class library is used to implement deferred evaluation and run-time dependence for task parallelism recognition, to obtain task parallelism through a data flow interpretation of data parallel array statements. Performance results show that the analysis and optimizations are both efficient and practical, allowing us to consider more substantial optimizations.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134007297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Multicomputer Toolbox: current and future directions","authors":"A. Skjellum","doi":"10.1109/SPLC.1993.365578","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365578","url":null,"abstract":"The Multicomputer Toolbox is a set of \"first-generation\" scalable parallel libraries. The Toolbox includes sparse, dense, direct and iterative linear algebra, a stiff ODE/DAE solver, and an open software technology for additional numerical algorithms. The Toolbox has an object-oriented design; C-based strategies for classes of distributed data structures (including distributed matrices and vectors) as well as uniform calling interfaces are defined. At a high level in the Toolbox, data-distribution-independence (DDI) support is provided. DDI is needed to build scalable libraries, so that applications do not have to redistribute data before calling libraries. Data-distribution-independent mapping functions implement this capability. Data-distribution-independent algorithms are sometimes more efficient than fixed-data-distribution counterparts, because redistribution of data can be avoided. Underlying the system is a \"performance and portability layer,\" which includes interfaces to sequential BLAS, the Zipcode message passing system, and a minimal set of Unix-portability functions. In particular, the Zipcode system provides communication contexts, and process groups, collective operations, and virtual topologies, all needed for building efficient scalable libraries, and large-scale application software.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129844735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling groundwater flow on MPPs","authors":"S. F. Ashby, R. Falgout, S.G. Smith, A. Tompson","doi":"10.1109/SPLC.1993.365586","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365586","url":null,"abstract":"The numerical simulation of groundwater flow in three-dimensional heterogeneous porous media is examined. To enable detailed modeling of large contaminated sites, preconditioned iterative methods and massively parallel computing power are combined in a simulator called PARFLOW. After describing this portable and modular code, some numerical results are given, including one that demonstrates the code's scalability.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116994553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, S. Ranka, R. Thakur, Jhy-Chun Wang
{"title":"Scalable libraries for Fortran 90D/High Performance Fortran","authors":"Z. Bozkus, A. Choudhary, G. Fox, T. Haupt, S. Ranka, R. Thakur, Jhy-Chun Wang","doi":"10.1109/SPLC.1993.365581","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365581","url":null,"abstract":"High Performance Fortran (HPF) is a new language, based on Fortran 90, developed by HPF Forum. The language was designed to support data parallel programming with top performance on MIMD and SIMD computers with non-uniform memory access costs. The main features of the language include the FORALL construct, new intrinsic functions and data distribution directives. A perusal of HPF shows that most of the parallelism is hidden in the runtime library. Further, efficient parallelization of FORALL construct and array assignment functions on distributed memory machines requires the use of collective communication to access non-local data. This communication could be structured (like shift, broadcast, all-to-all communication) or unstructured. Thus, the scalability of the code generated by the compiler depend on the scalability of these libraries. In this paper, we present the design and performance of an scalable library for the intrinsic functions and the collective communication library.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115358110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable, extensible, and portable numerical libraries","authors":"W. Gropp, Barry F. Smith","doi":"10.1109/SPLC.1993.365579","DOIUrl":"https://doi.org/10.1109/SPLC.1993.365579","url":null,"abstract":"Designing a scalable and portable numerical library requires consideration of many factors, including choice of parallel communication technology, data structures, and user interfaces. The PETSc library (Portable Extensible Tools for Scientific computing) makes use of modern software technology to provide a flexible and portable implementation. This paper discusses the use of a meta-communication layer (allowing the user to choose different transport layers such as MPI, p4, pvm, or vendor-specific libraries) for portability, an aggressive data-structure-neutral implementation that minimizes dependence on particular data structures (even vectors), permitting the library to adapt to the user rather than the other way around, and the separation of implementation language from user-interface language. Examples are presented.<<ETX>>","PeriodicalId":146277,"journal":{"name":"Proceedings of Scalable Parallel Libraries Conference","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1993-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126932599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}