{"title":"Parallel preconditioning and approximation inverses on the Connection Machine","authors":"M. Grote, H. Simon","doi":"10.1109/SHPCC.1992.232685","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232685","url":null,"abstract":"The authors present a new approach to preconditioning for very large, sparse, non-symmetric, linear systems. It explicitly computes an approximate inverse to the original matrix that can be applied most efficiently for iterative methods on massively parallel machines. The algorithm and its implementation on the Connection Machine CM-2 are discussed in detail and supported by timings obtained from real problem data.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128428520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive methods and rectangular partitioning problem","authors":"C. Ozturan, B. Szymanski, J.E. Flaherthy","doi":"10.1109/SHPCC.1992.232665","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232665","url":null,"abstract":"Partitioning problems for rectangular domains having nonuniform workload for mesh-connected SIMD architectures are discussed. The considered rectangular workloads result from application of adaptive methods to the solution of hyperbolic differential equations on SIMD machines. A new form of the partitioning problem is defined in which sub-meshes of processors are assigned to tasks, each task being a discretized rectangular sub-domain. The work per processor (i.e. the work density) is balanced among the K sub-rectangular meshes of processors. First, a formalization of the 1D problem is given and a O(Kn/sup 3/) time and (Kn/sup 2/) space optimal algorithm is proposed. A more efficient heuristic algorithm is also given for the 1D problem. Finally 2D heuristics are developed by projecting the weights on to a 1D array.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130376441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compiler optimizations for distributed-memory programs","authors":"Rajesh K. Gupta","doi":"10.1109/SHPCC.1992.232651","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232651","url":null,"abstract":"The single-program multiple-data (SPMD) mode of execution is an effective approach for exploiting parallelism in programs written using the shared-memory programming model on distributed memory machines. However, during SPMD execution one must consider dependencies due to the transfer of data among the processors. Such dependencies can be avoided by reordering the communication operations (sends and receives). However, no formal framework has been developed to explicitly recognize the represent such dependencies. The author identifies two types of dependencies, namely communication dependencies and scheduling dependencies, and proposes to represent these dependencies explicitly in the program dependency graph. Next, he presents program transformations that use this dependency information in transforming the program and increasing the degree of parallelism exploited. Finally, the author presents program transformations that reduce communication related run-time overhead.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124096231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A look at scalable dense linear algebra libraries","authors":"J. Dongarra, R. V. D. Geijn, D. Walker","doi":"10.1109/SHPCC.1992.232670","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232670","url":null,"abstract":"Discusses the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object-oriented interface to the library permits more portable applications to be written, and is easy to learn and use, since details of the parallel implementation are hidden from the user. Experiments on the Intel Touchstone Delta system with a prototype code that uses the square block scattered decomposition to perform LU factorization are presented and analyzed. It was found that the code was both scalable and efficient, performing at about 14 GFLOPS (double precision) for the largest problem considered.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126850458","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsteady flow simulation using an MIMD computer","authors":"S. Palaniswamy, S. Chakravarthy","doi":"10.1109/SHPCC.1992.232675","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232675","url":null,"abstract":"Numerical simulations of unsteady flows require algorithms with high order of accuracy in both time and space and correspondingly vast computer resources, because flow properties may have to be computed for large times before significant information can be extracted from them. Massively parallel computers are particularly well suited for these simulations if the computational domain could be mapped on to the processors while maintaining high efficiency and time synchronization between the nodes of the MIMD computer. Certain aspects of the implementation of a time-accurate algorithm to solve the Navier-Stokes equations on structured grids, using a massively parallel processor, are presented in this paper along with results for two problems: (1) the changing characteristics of the near-wake flow behind a cylinder as a function of Reynolds number and (2) dynamics of vortex pairing in free-shear layers.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123371818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental mapping for solution-adaptive multigrid hierarchies","authors":"J. De Keyser, D. Roose","doi":"10.1109/SHPCC.1992.232666","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232666","url":null,"abstract":"The full multigrid method uses a hierarchy of successively finer grids. In a solution-adaptive grid hierarchy each grid is obtained by adaptive refinement of the grid on the previous level. On a distributed memory multiprocessor, each grid level must be partitioned and mapped so as to minimize the multigrid cycle execution time. In this report, several grid partitioning and load (re)mapping strategies that deal with this problem are compared. The influence of the type of multigrid cycle is examined. Results obtained on an iPSC hypercube are reported.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123626310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Load balancing and parallel implementation of iterative algorithms for row-continuous Markov chains","authors":"M. Colajanni, M. Angelaccio","doi":"10.1109/SHPCC.1992.232656","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232656","url":null,"abstract":"Presents the first parallel algorithms for solving row-continuous or generalized birth-death (GBD) Markov chains on distributed memory MIMD multiprocessors. These systems are characterized by very large transition probability matrices, decomposable in heterogeneous tridiagonal blocks. The parallelization of three aggregation/disaggregation iterative methods is carried out by a unique framework that keeps into account the special matrix structure. Great effort has been also devoted to define a general algorithm for approximating the optimum workload. Various computational experiments show that Vantilborgh's (1985) method is the fastest of the three algorithms on any data set dimension.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127643919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Applications of a parallel pressure-correction algorithm to 3D turbomachinery flows","authors":"M. Braaten","doi":"10.1109/SHPCC.1992.232657","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232657","url":null,"abstract":"A parallel algorithm for the solution of three-dimensional compressible flows in turbomachinery has been developed and demonstrated on a scalable distributed memory multicomputer. The algorithm solves the compressible form of the Euler or Navier-Stokes equations via a compressible pressure correction formulation. To achieve high accuracy for highly turning blade rows, the computational grid is constructed without requiring strict periodicity of the grid points along the periodic boundaries between the blade passages. The impact of this feature on code parallelization and computational efficiency is described. The algorithm has been demonstrated on up to 128 processors of an Intel iPSC/860. Performance 2.4 times faster than a single Cray Y-MP processor has been achieved for an inviscid turbomachinery calculation on 154000 grid points with 128 processors of the iPSC/860.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116292540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data alignment: transformations to reduce communication on distributed memory architectures","authors":"M. O’Boyle, G. A. Hedayat","doi":"10.1109/SHPCC.1992.232671","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232671","url":null,"abstract":"The relative storage, or alignment, of array data in distributed memory critically determines the amount of communication overhead. This paper expresses data alignment in a linear algebraic framework. Aligned data can be viewed as forming a hyperplane in the iteration space. This allows the quantification of data alignment and the determination of the existence of transformations and the determination of the existence of transformations to reduce nonlocal access. This has led to a new alignment transformation which is applicable to a wider class of problems than existing techniques. The global impact of such transformations are discussed as is the effect of alignment on partitioning.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126642767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Beguelin, J. Dongarra, Alexander Geist, R. Manchek, Keith Moore, Reed Wade, V. Sunderam
{"title":"HeNCE: graphical development tools for network-based concurrent computing","authors":"A. Beguelin, J. Dongarra, Alexander Geist, R. Manchek, Keith Moore, Reed Wade, V. Sunderam","doi":"10.1109/SHPCC.1992.232678","DOIUrl":"https://doi.org/10.1109/SHPCC.1992.232678","url":null,"abstract":"HeNCE (heterogeneous network computing environment) is an X Window based graphical parallel programming environment that was created to assist scientists and engineers with the development of parallel programs. HeNCE provides a graphical interface for creating, compiling, executing, and debugging parallel programs, as well as configuring a distributed virtual computer (using PVM). HeNCE programs can be run on a single Unix workstation or over a network of heterogeneous machines. The paper describes the purpose and use of the HeNCE software.<<ETX>>","PeriodicalId":254515,"journal":{"name":"Proceedings Scalable High Performance Computing Conference SHPCC-92.","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128052150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}