{"title":"Phase modeling of a parallel scientific code","authors":"P. Worley","doi":"10.1109/SHPCC.1992.232677","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232677","url":null,"abstract":"Describes a performance model for a parallel program that solves the nonlinear shallow water equations using the spectral transform method. The model is generated via a phase analysis, and consists of a sequence of simple models whose sum describes the performance of the entire code. This use of a sequence of simple models increases the range of validity of the model as the problem and machine parameters scale.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"64 9-10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131789737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A methodology for visualizing performance of loosely synchronous programs","authors":"S. Sarukkai, D. Kimelman, L. Rudolph","doi":"10.1109/SHPCC.1992.232663","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232663","url":null,"abstract":"Introduces a new set of views for displaying the progress of loosely synchronous computations involving large numbers of processors on large problems. The authors suggest a methodology for employing these views in succession in order to gain progressively more detail concerning program behavior. At each step, focus is refined to include just those program sections or processors which have been determined to be bottlenecks. The authors present their experience in using this methodology to uncover performance problems in selected applications.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128647743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Synthesizing scalable computations from sequential programs","authors":"R. Govindaraju, B. Szymański","doi":"10.1109/SHPCC.1992.232639","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232639","url":null,"abstract":"Advocates an approach that supports decomposition and scalable synthesis of a parallel computation. The decomposition is achieved with the aid of annotation languages that enable one top annotate programs written in various programming languages. The authors have implemented annotations for the Equational Programming Language (EPL). The synthesis is achieved with the aid of a simple configuration language that describes the computation in terms of interactions of programs and their fragments created by annotations. The decomposition and synthesis simplify the process of: (1) determining the grain size for efficient parallel processing, (2) data distribution, and (3) run-time optimization. The authors discuss annotations and configurations suitable for parallel programs written in EPL and FORTRAN and their use in scalable synthesis. They first discuss how annotations can define different computational blocks from a single program and how these blocks determine data distributions across processors. They outline a design of the configurator and show how FORTRAN programs can be configured into a hierarchical structure of computational blocks. An example of LU decomposition written in both EPL and FORTRAN illustrates the process of program decomposition and synthesis. The authors discuss code generation for synthesized computations, and some possible extensions.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115836171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"METRICS: a tool for the display and analysis of mappings in message-passing multicomputers","authors":"V. Lo, K. Windisch, R. Datta","doi":"10.1109/SHPCC.1992.232647","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232647","url":null,"abstract":"METRICS is a software tool for the static (compile time) analysis of mappings. METRICS is designed for use in the mapping of parallel computations consisting of a set of communicating parallel processes which communicate through explicit message passing. The target architectures currently supported include the mesh and hypercube as well as user-defined topologies. The underlying routing schemes include store-and-forward, virtual cut-through, and wormhole routing. METRICS is designed to display the mapping in a clear, logical, and intuitive format so that the user can evaluate it quantitatively as well as visually. The contributions of METRICS include its rich underlying formalism, the temporal communication graph, a hybrid between the static task graph and the DAG; its mechanisms for handling massive parallelism using subviews, scrolling, and hierarchical grouping; and the broad spectrum of mapping metrics used in the analysis of each mapping.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125774660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Clark, R. V. Hanxleden, K. Kennedy, C. Koelbel, L.R. Scott
{"title":"Evaluating parallel languages for molecular dynamics computations","authors":"T. Clark, R. V. Hanxleden, K. Kennedy, C. Koelbel, L.R. Scott","doi":"10.1109/SHPCC.1992.232682","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232682","url":null,"abstract":"The paper describes the practicalities of porting a basic molecular dynamics computation to a distributed-memory machine. In the process, it shows how program annotations can aid in parallelizing a moderately complex code. It also argues that algorithm replacement may be necessary in parallelization, a task which cannot be performed automatically. The paper closes with some results from a parallel GROMOS implementation.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"NS26 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123423732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portable execution traces for parallel program debugging and performance visualization","authors":"A. Couch, D.W. Krumme","doi":"10.1109/SHPCC.1992.232661","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232661","url":null,"abstract":"There is much interest in defining a standard for event traces collected from parallel architectures. A standard would support free data and tool sharing among researchers working on varied architectures. But defining that standard has proved to be difficult. Any standard must allow user-defined events and avoid or hide event semantics as much as possible. The authors propose a standard based on a declaration language, which describes how the raw event trace is to be translated into a normal form. Analysis tools then share a common interface to a compiler and interpreter which use the declarations to fetch, transform, and augment trace data. This concept is evaluated through construction of a prototype declaration compiler and interpreter.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128379401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Preliminary experience in developing a parallel thin-layer Navier Stokes code and implications for parallel language design","authors":"D. Olander, R. Schnabel","doi":"10.1109/SHPCC.1992.232631","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232631","url":null,"abstract":"Describes preliminary experience in developing a parallel version of a reasonably large, multi-grid based computational fluid dynamics code, and implementing this version on a distributed memory multiprocessor. Creating an efficient parallel code has involved interesting decisions and tradeoffs in the mapping of the key data structures to the processors. It also has involved significant reordering of computations in computational kernels, including the use of pipelining, to achieve good efficiency. The authors discuss these issues and their computational experiences with different alternatives, and briefly discuss the implications of these experiences upon the design of effective languages for distributed parallel computation.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116542108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vienna Fortran 90","authors":"S. Benkner, Barbara Chapman, H. Zima","doi":"10.1109/SHPCC.1992.232688","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232688","url":null,"abstract":"Vienna Fortran 90 is a language extension of Fortran 90 which enables the user to write programs for distributed memory multiprocessors using global data references only. Performance of software on such systems is profoundly influenced by the manner in which data is distributed to the processors. Hence, Vienna Fortran 90 provides the user with a wide range of facilities for the mapping of data to processors. It combines the advantages of the shared memory programming paradigm with mechanisms for explicit user control of those aspects of the program which have the greatest impact on efficiency. The paper presents the major features of Vienna Fortran 90 and gives examples of their use.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130847365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel programming tool for scheduling on distributed memory multiprocessors","authors":"T. Yang, A. Gerasoulis","doi":"10.1109/SHPCC.1992.232673","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232673","url":null,"abstract":"PYRROS is a tool for scheduling and parallel code generation for distributed memory message passing architectures. In this paper, the authors discuss several compile-time optimization techniques used in PYRROS. The scheduling part of PYRROS optimizes both data and program mapping so that the parallel time is minimized. The communication and storage optimization part facilitates the generation of efficient parallel codes. The related issues of partitioning and 'owner computes rule' are discussed and the importance of program scheduling is demonstrated.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132182941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data remapping for distributed-memory multicomputers","authors":"C. Chase, A. Reeves","doi":"10.1109/SHPCC.1992.232660","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232660","url":null,"abstract":"The fragmented memory model of distributed-memory multicomputers, such as the Intel iPSC and Paragon series of computers, and the Thinking Machines CM-5, introduces significant complexity into the compilation process. Since most conventional programming languages provide a model of a global memory, a distributed-memory compiler must translate all data references to correspond to the fragmented memory on the system hardware. This paper describes a technique called array remapping which can automatically be applied to parallel loops containing arbitrary array subscripts. The compile time and runtime aspects of providing support for remapping are described, and the performance of this implementation of remapping is presented.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"107 5-6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132193764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}