Manish Gupta, S. Midkiff, E. Schonberg, V. Seshadri, David Shields, Ko-Yang Wang, Wai-Mee Ching, T. Ngo
{"title":"An HPF Compiler for the IBM SP2","authors":"Manish Gupta, S. Midkiff, E. Schonberg, V. Seshadri, David Shields, Ko-Yang Wang, Wai-Mee Ching, T. Ngo","doi":"10.1145/224170.224422","DOIUrl":"https://doi.org/10.1145/224170.224422","url":null,"abstract":"We describe pHPF, an research prototype HPF compiler for the IBM SP series parallel machines. The compiler accepts as input Fortran 90 and Fortran 77 programs, augmented with HPF directives; sequential loops are automatically parallelized. The compiler supports symbolic analysis of expressions. This allows parameters such as the number of processors to be unknown at compile-time without significantly affecting performance. Communication schedules and computation guards are generated in a parameterized form at compile-time. Several novel optimizations and improved versions of well-known optimizations have been implemented in pHPF to exploit parallelism and reduce communication costs. These optimizations include elimination of redundant communication using data-availability analysis; using collective communication; new techniques for mapping scalar variables; coarse-grain wavefronting; and communication reduction in multi-dimensional shift communications. We present experimental results for some well-known benchmark routines. The results show the effectiveness of the compiler in generating efficient code for HPF programs.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122819487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, M. Lam
{"title":"Detecting Coarse - Grain Parallelism Using an Interprocedural Parallelizing Compiler","authors":"Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, M. Lam","doi":"10.1145/224170.224337","DOIUrl":"https://doi.org/10.1145/224170.224337","url":null,"abstract":"This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array and scalar variables, and symbolic analysis of array subscripts. The interprocedural analysis framework is designed to provide analysis results nearly as precise as full inlining but without its associated costs. Experimentation with this system shows that it is capable of detecting coarser granularity of parallelism than previously possible. Specifically, it can parallelize loops that span numerous procedures and hundreds of lines of codes, frequently requiring modifications to array data structures such as privatization and reduction transformations. Measurements from several standard benchmark suites demonstrate that an integrated combination of interprocedural analyses can substantially advance the capability of automatic parallelization technology.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117136880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Where is the Supercomputer Software Revolution?","authors":"Dennis Gannon, L. Smarr, V. Schuster","doi":"10.1145/224170.224507","DOIUrl":"https://doi.org/10.1145/224170.224507","url":null,"abstract":"","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115890015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing Computational Science Curricula: The EarthVision Experience","authors":"Ralph K. Coppola, E. Toth","doi":"10.1145/224170.224202","DOIUrl":"https://doi.org/10.1145/224170.224202","url":null,"abstract":"Technology is used to empower students to go beyond traditional limitations. EarthVision provides the opportunity to participate in an authentic research environment enables the students to develop a sense of self worth and esteem established in the context of a phased curriculum, bringing together experts in a variety of disciplines. New techniques such as modeling and scientific visualization are employed to expand the types of phenomena which are possible to examine at a high school level. The use of concept strands going from simple elements to complicated representations helps to move the teacher/student teams from a highly structured learning environment to one that is highly independent. The scientific method, which employs validation throughout the computational science process, brings rigor and integrity which stimulates skill development needed for the development of autonomy. The result is significant cognitive development coupled with a positive affective orientation.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124845002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance of a Parallel Global Atmospheric Chemical Tracer Model","authors":"J. Demmel, Sharon L. Smith","doi":"10.1145/224170.224504","DOIUrl":"https://doi.org/10.1145/224170.224504","url":null,"abstract":"As partof a NASA HPCC Grand Challenge project, we are designing and implementing a parallel atmospheric chemical tracer model that will be suitable for use in global simulations. To accomplish this goal, our starting point has been an atmospheric pollution model that was originally used to study pollution in the Los Angeles Basin. The model includes gas-phase and aqueous-phase chemistry, radiation, aerosol physics, advection, convection, deposition, visibility and emissions. The potential bottlenecks in the model for parallel implementation are the compute-intensiveODE solving phase with load balancing problems,and the communication-intensive-advection phase. We describe the implementation and performance results on a variety of platforms,with emphasis on a detailed performance model we developed to predict performance, identify bottlenecks, guide our implementation, assess scalability, and evaluate architectures. An atmospheric chemical tracer model such as the one we describe in this paper will be one component of a larger Earth Systems Model (ESM), being developed under the direction of C. R. Mechoso of UCLA, incorporating atmospheric dynamics, atmospheric physics, ocean dynamics, and a database and visualization system.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121759414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Bernard, C. DeTar, S. Gottlieb, U. Heller, J. Hetrick, N. Ishizuka, L. Kärkkäinen, S. Lantz, K. Rummukainen, R. Sugar, D. Toussaint, M. Wingate
{"title":"Lattice QCD on the IBM Scalable POWERParallel Systems SP2","authors":"C. Bernard, C. DeTar, S. Gottlieb, U. Heller, J. Hetrick, N. Ishizuka, L. Kärkkäinen, S. Lantz, K. Rummukainen, R. Sugar, D. Toussaint, M. Wingate","doi":"10.1145/224170.224307","DOIUrl":"https://doi.org/10.1145/224170.224307","url":null,"abstract":"A 512 node IBM Scalable POWERParallel Systems SP2 was installed at the Cornell Theory Center in October 1994. During the past couple of months we have been porting and optimizing code for carrying out lattice QCD calculations. Present performance is far from ideal, however, and optimization efforts are still under way. The rate limiting step in our code involves a rather generic inversion of a large, sparse system, based on a partial differential equation in a multidimensional space. The insights we have gained so far may be useful in diagnosing performance in a wide class of applications.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127544778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel Incompressible Flow Solver Package with a Parallel Multigrid Elliptic Kernel","authors":"J. Lou, R. Ferraro","doi":"10.1145/224170.224406","DOIUrl":"https://doi.org/10.1145/224170.224406","url":null,"abstract":"A parallel time-dependent incompressible flow solver and a parallel multigrid elliptic kernel are described. The flow solver is based on a second-order projection method applied to a staggered finite-difference grid. The multigrid algorithms implemented in the elliptic kernel, which is needed by the flow solver, are V-cycle and full V-cycle schemes. A grid-partition strategy is used in the parallel implementations of both the flow solver and the multigrid elliptic kernel on all fine and coarse grids. Numerical experiments and parallel performance tests show the parallel solver package is numerically stable, physically robust and computationally efficient. Both the multigrid elliptic kernel and the flow solver scale very well to a large number of processors on the Intel Paragon and the Cray T3D for computations with moderate granularity. The solver package has been carefully designed and coded so that it can be easily adapted to solving a variety of interesting two and three-dimensional flow problems. The solver package is portable to parallel systems that support MPI, PVM and Intel NX for interprocessor communications.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128802120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational Approach to the Statistical Mechanics of Protein Folding","authors":"M. Hao, H. Scheraga","doi":"10.1145/224170.224216","DOIUrl":"https://doi.org/10.1145/224170.224216","url":null,"abstract":"A statistical mechanical approach to the protein folding problem is developed based on computer simulations. The properties of proteins related to conformation and folding are determined from the density of states of the protein. A new simulation procedure, the Entropy Sampling Monte Carlo method, is used to determine accurately the density of states of the protein. To enhance the efficiency of sampling the conformational space of a protein, two techniques (a conformational-biased chain regrowth procedure and a jump-walking method) were introduced into the simulation. Applications of the approach to study a number of model polypeptides and a small protein, Bovine Pancreatic Trypsin Inhibitor, have been carried out. The results obtained demonstrate that the new approach is more powerful and produces richer information about the thermodynamics and folding behavior of proteins than conventional simulation methods.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127435739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Structured Approach to Instrumentation System Development and Evaluation","authors":"A. Waheed, D. Rover","doi":"10.1145/224170.224271","DOIUrl":"https://doi.org/10.1145/224170.224271","url":null,"abstract":"Software instrumentation is a widely used technique for parallel program performance evaluation, debugging, steering, and visualization. With increasing sophistication of parallel tool development technologies and broadening of application areas where these tools are being used, runtime data collection and management activities are growing in importance; we use the term instrumentation system (IS) to refer to components that support these activities in state-of-the-art parallel tool environments. An IS consists of Local Instrumentation Servers, an Instrumentation System Manager, and a Transfer Protocol. The overheads and perturbation effects attributed to an IS must be accounted for to ensure correct and efficient representation of program behavior, especially for on-line and real-time environments. Moreover, an IS is a key facilitator of integration of tools in an environment. In this paper, we define the primary components of an IS and their roles in an integrated environment, and classify ISs according to selected features. We introduce a structured approach to plan, design, model, evaluate, implement, and validate an IS. The approach provides a means to formally address domain-specific requirements. The modeling and evaluation processes are illustrated in the context of three distinctive IS case studies for PICL, Paradyn, and Vista. Valuable feedback on performance effects of IS parameters and policies can assist developers in making design decisions early in the software development cycle. Additionally, use of structured software engineering methods can support the mapping of an abstract IS model to an implementation of the IS.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132655817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Symbolic Array Dataflow Analysis for Array Privatization and Program Parallelization","authors":"Junjie Gu, Zhiyuan Li, Gyungho Lee","doi":"10.1145/224170.224318","DOIUrl":"https://doi.org/10.1145/224170.224318","url":null,"abstract":"Array dataflow information plays an important role for successful automatic parallelization of Fortran programs. This paper proposes a powerful symbolic array dataflow analysis to support array privatization and loop parallelization for programs with arbitrary control flow graphs and acyclic call graphs. Our scheme summarizes array access information using guarded array regions and propagates such regions over a Hierarchical Supergraph (HSG). The use of guards allows us to use the information in IF conditions to sharpen the array dataflow analysis and thereby to handle difficult cases which elude other existing techniques. The guarded array regions retain the simplicity of set operations for regular array regions in common cases, and they enhance regular array regions in complicated cases by using guards to handle complex symbolic expressions and array shapes. Scalar values that appear in array subscripts and loop limits are substituted on the fly during the array information propagation, which disambiguates the symbolic values precisely for set operations. We present efficient algorithms that implement our scheme. Initial experiments of applying our analysis to Perfect Benchmarks show promising results of improved array privatization.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122841202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}