{"title":"Solution of the 3-D Euler equations for the flow about a fighter aircraft configuration using a hypercube parallel processor","authors":"D. Weissbein, J. F. Mangus, M. W. George","doi":"10.1145/63047.63066","DOIUrl":"https://doi.org/10.1145/63047.63066","url":null,"abstract":"The Computational Fluid Dynamics (CFD) code FL057, which solves the 3-D Euler Equations using an explicit, finite volume, Runge-Kutta algorithm, was implemented on an Intel IPSC-MX parallel processor. Spatial decomposition was effected on the solution grid about a fighter aircraft configuration and Binary Reflected Graycodes were used to map the computational domain onto the IPSC insuring nearest neighbor communication. Results and timings of the implementation are presented with a comparison of the IPSC and a uniprocessor machine of similar classification to assess the performance of the IPSC on FL057. Suggested improvements to the current version of the parallelized code are listed to aid load balancing, vectorization, and more efficient memory use.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123253724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypercube data analysis in astronomy: optical interferometry and millisecond pulsar searches","authors":"P. Gorham, T. Prince, S. Anderson","doi":"10.1145/63047.63049","DOIUrl":"https://doi.org/10.1145/63047.63049","url":null,"abstract":"Astronomical data sets are beginning to live up to their name, in both their sizes and the complexity of the analysis required. Here we discuss two astronomical data analysis problems which we have begun to implement on a hypercube concurrent processor environment: The intensive image processing required in an optical interferometry project, and the large scale power spectral analysis required by a search for millisecond-period radio pulsars. In both cases the analysis proceeds largely in the Fourier domain, and we find that the problems are readily adapted to a concurrent environment. In the following report, we outline briefly the astronomical background for each problem, then discuss the general computational requirements, and finally present possible hypercube algorithms and results achieved to date.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126260391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Region growing on a hypercube multiprocessor","authors":"M. Willebeek-LeMair, A. Reeves","doi":"10.1145/63047.63057","DOIUrl":"https://doi.org/10.1145/63047.63057","url":null,"abstract":"The region growing paradigm for image segmentation groups neighboring pixels into regions depending upon a predetermined homogeneity criteria. A parallel method for region growing on an MIMD multiprocessor system is presented. Since the region growing problem exhibits non-uniform and unpredictable load fluctuations, it requires a dynamic load balancing scheme to achieve a balanced load distribution. The results of implementing a parallel region growing algorithm on the Intel-iPSC hypercube are discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122348759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The preconditioned conjugate gradient method on the hypercube","authors":"G. Abe, K. Hane","doi":"10.1145/63047.63126","DOIUrl":"https://doi.org/10.1145/63047.63126","url":null,"abstract":"A parallel algorithm for solving the elliptic partial differential equation (PDE) is described in this paper through the finite difference method (FDM) The Concurrent Preconditioned Conjugate Gradient method is developed to optimize processor load balancing. This algorithm is evaluated on a hypercube-based concurrent machine, the Intel iPSC.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116084696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley
{"title":"An experimental study of methods for parallel preconditioned Krylov methods","authors":"D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley","doi":"10.1145/63047.63128","DOIUrl":"https://doi.org/10.1145/63047.63128","url":null,"abstract":"High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential.\u0000We discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121588162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypercube performance for 2-D seismic finite-difference modeling","authors":"L. J. Baker","doi":"10.1145/63047.63068","DOIUrl":"https://doi.org/10.1145/63047.63068","url":null,"abstract":"Wave-equation seismic modeling in two space dimensions is computationally intensive, often requiring hours of supercomputer CPU time to run typical geological models with 500 × 500 grids and 100 sources. This paper analyzes the performance of ACOUS2D, an explicit 4th-order finite-difference program, on Intel's 16-processor vector hypercube computer. The conversion of the sequential version of ACOUS2D to run on hypercube was straightforward, but time-consuming. The key consideration for optimal efficiency is load balancing. On a fairly typical geologic model, the 16-processor Intel vector hypercube computer ran ACOUS2D at 1/3 the speed of a Cray-1S.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123030423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blitz: a rule-based system for massively parallel architectures","authors":"K. Morgan","doi":"10.1145/63047.63091","DOIUrl":"https://doi.org/10.1145/63047.63091","url":null,"abstract":"The rule-based system has emerged as an important tool to developers of artificial intelligence programs. Because of the computational resources required to realize the MATCH-SELECT-EXECUTE cycle of rule-based systems, researchers have been trying to introduce parallelism into these systems for some time. We describe a new approach to parallel rule-based systems which exploits fine-grained hypercube hardware. The new algorithms for parallel rule matching and simultaneous execution of several rules at once are presented. Experimental results using a Connection Machine* implementation of BLITZ are presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131190153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binsorting on hypercubes with d-port communication","authors":"S. Seidel, W. George","doi":"10.1145/63047.63102","DOIUrl":"https://doi.org/10.1145/63047.63102","url":null,"abstract":"Three sorting algorithms are given for hypercubes with d-port communication. All of these algorithms are based on binsort at the global level. The binsort allows the movement of keys among nodes to be performed by a d-port complete exchange rather than a sequence of l-port exchanges as in other algorithms. This lowers communication costs by at least a factor of d compared to other sorting algorithms. The first algorithm assumes the keys are uniformly distributed and selects bin boundaries based on the global maximum and minimum keys. The other two algorithms make no assumption about the distribution of keys and so they sample the keys before the binsort in order to estimate their distribution. Splitting keys based on that estimate reduce the variance among the lengths of the subsequences left in the nodes after the complete exchange of bins which in turn helps to balance the computational load in each node. The performance of two of these algorithms on an FPS T-40 is given for data of various distributions and is compared to the performance of bitonic sort and hyperquicksort.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127795040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Molecular dynamics simulation on an iPSC of defects in crystals","authors":"P. Flinn","doi":"10.1145/63047.63084","DOIUrl":"https://doi.org/10.1145/63047.63084","url":null,"abstract":"Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. TO copy otherwise, or to republish, requires a fee and/or specfic permission.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133057620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QED on the connection machine","authors":"C. Baillie, S. Johnsson, Luis F. Ortiz, G. Pawley","doi":"10.1145/63047.63082","DOIUrl":"https://doi.org/10.1145/63047.63082","url":null,"abstract":"Physicists believe that the world is described in terms of gauge theories. A popular technique for investigating these theories is to discretize them onto a lattice and simulate numerically by a computer, yielding so-called lattice gauge theory. Such computations require at least 1014 floating-point operations, necessitating the use of advanced architecture supercomputers such as the Connection Machine made by Thinking Machines Corporation. Currently the most important gauge theory to be solved is that describing the sub-nuclear world of high energy physics: Quantum Chromo-dynamics (QCD). The simplest example of a gauge theory is Quantum Electro-dynamics (QED), the theory which describes the interaction of electrons and photons. Simulation of QCD requires computer software very similar to that for the simpler QED problem. Our current QED code achieves a computational rate of 1.6 million lattice site updates per second for a Monte Carlo algorithm, and 7.4 million site updates per second for a microcanonical algorithm. The estimated performance for a Monte Carlo QCD code is 200,000 site updates per second (or 5.6 Gflops/sec).","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114321845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}