{"title":"Solution of the 3-D Euler equations for the flow about a fighter aircraft configuration using a hypercube parallel processor","authors":"D. Weissbein, J. F. Mangus, M. W. George","doi":"10.1145/63047.63066","DOIUrl":"https://doi.org/10.1145/63047.63066","url":null,"abstract":"The Computational Fluid Dynamics (CFD) code FL057, which solves the 3-D Euler Equations using an explicit, finite volume, Runge-Kutta algorithm, was implemented on an Intel IPSC-MX parallel processor. Spatial decomposition was effected on the solution grid about a fighter aircraft configuration and Binary Reflected Graycodes were used to map the computational domain onto the IPSC insuring nearest neighbor communication. Results and timings of the implementation are presented with a comparison of the IPSC and a uniprocessor machine of similar classification to assess the performance of the IPSC on FL057. Suggested improvements to the current version of the parallelized code are listed to aid load balancing, vectorization, and more efficient memory use.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123253724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypercube data analysis in astronomy: optical interferometry and millisecond pulsar searches","authors":"P. Gorham, T. Prince, S. Anderson","doi":"10.1145/63047.63049","DOIUrl":"https://doi.org/10.1145/63047.63049","url":null,"abstract":"Astronomical data sets are beginning to live up to their name, in both their sizes and the complexity of the analysis required. Here we discuss two astronomical data analysis problems which we have begun to implement on a hypercube concurrent processor environment: The intensive image processing required in an optical interferometry project, and the large scale power spectral analysis required by a search for millisecond-period radio pulsars. In both cases the analysis proceeds largely in the Fourier domain, and we find that the problems are readily adapted to a concurrent environment. In the following report, we outline briefly the astronomical background for each problem, then discuss the general computational requirements, and finally present possible hypercube algorithms and results achieved to date.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126260391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Region growing on a hypercube multiprocessor","authors":"M. Willebeek-LeMair, A. Reeves","doi":"10.1145/63047.63057","DOIUrl":"https://doi.org/10.1145/63047.63057","url":null,"abstract":"The region growing paradigm for image segmentation groups neighboring pixels into regions depending upon a predetermined homogeneity criteria. A parallel method for region growing on an MIMD multiprocessor system is presented. Since the region growing problem exhibits non-uniform and unpredictable load fluctuations, it requires a dynamic load balancing scheme to achieve a balanced load distribution. The results of implementing a parallel region growing algorithm on the Intel-iPSC hypercube are discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122348759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The preconditioned conjugate gradient method on the hypercube","authors":"G. Abe, K. Hane","doi":"10.1145/63047.63126","DOIUrl":"https://doi.org/10.1145/63047.63126","url":null,"abstract":"A parallel algorithm for solving the elliptic partial differential equation (PDE) is described in this paper through the finite difference method (FDM) The Concurrent Preconditioned Conjugate Gradient method is developed to optimize processor load balancing. This algorithm is evaluated on a hypercube-based concurrent machine, the Intel iPSC.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116084696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley
{"title":"An experimental study of methods for parallel preconditioned Krylov methods","authors":"D. Baxter, J. Saltz, M. Schultz, S. Eisenstat, K. Crowley","doi":"10.1145/63047.63128","DOIUrl":"https://doi.org/10.1145/63047.63128","url":null,"abstract":"High performance multiprocessor architectures differ both in the number of processors, and in the delay costs for synchronization and communication. In order to obtain good performance on a given architecture for a given problem, adequate parallelization, good balance of load and an appropriate choice of granularity are essential.\u0000We discuss the implementation of parallel version of PCGPAK for both shared memory architectures and hypercubes. Our parallel implementation is sufficiently efficient to allow us to complete the solution of our test problems on 16 processors of the Encore Multimax/320 in an amount of time that is a small multiple of that required by a single head of a Cray X/MP, despite the fact that the peak performance of the Multimax processors is not even close to the supercomputer range. We illustrate the effectiveness of our approach on a number of model problems from reservoir engineering and mathematics.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121588162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypercube performance for 2-D seismic finite-difference modeling","authors":"L. J. Baker","doi":"10.1145/63047.63068","DOIUrl":"https://doi.org/10.1145/63047.63068","url":null,"abstract":"Wave-equation seismic modeling in two space dimensions is computationally intensive, often requiring hours of supercomputer CPU time to run typical geological models with 500 × 500 grids and 100 sources. This paper analyzes the performance of ACOUS2D, an explicit 4th-order finite-difference program, on Intel's 16-processor vector hypercube computer. The conversion of the sequential version of ACOUS2D to run on hypercube was straightforward, but time-consuming. The key consideration for optimal efficiency is load balancing. On a fairly typical geologic model, the 16-processor Intel vector hypercube computer ran ACOUS2D at 1/3 the speed of a Cray-1S.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123030423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Blitz: a rule-based system for massively parallel architectures","authors":"K. Morgan","doi":"10.1145/63047.63091","DOIUrl":"https://doi.org/10.1145/63047.63091","url":null,"abstract":"The rule-based system has emerged as an important tool to developers of artificial intelligence programs. Because of the computational resources required to realize the MATCH-SELECT-EXECUTE cycle of rule-based systems, researchers have been trying to introduce parallelism into these systems for some time. We describe a new approach to parallel rule-based systems which exploits fine-grained hypercube hardware. The new algorithms for parallel rule matching and simultaneous execution of several rules at once are presented. Experimental results using a Connection Machine* implementation of BLITZ are presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131190153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Binsorting on hypercubes with d-port communication","authors":"S. Seidel, W. George","doi":"10.1145/63047.63102","DOIUrl":"https://doi.org/10.1145/63047.63102","url":null,"abstract":"Three sorting algorithms are given for hypercubes with d-port communication. All of these algorithms are based on binsort at the global level. The binsort allows the movement of keys among nodes to be performed by a d-port complete exchange rather than a sequence of l-port exchanges as in other algorithms. This lowers communication costs by at least a factor of d compared to other sorting algorithms. The first algorithm assumes the keys are uniformly distributed and selects bin boundaries based on the global maximum and minimum keys. The other two algorithms make no assumption about the distribution of keys and so they sample the keys before the binsort in order to estimate their distribution. Splitting keys based on that estimate reduce the variance among the lengths of the subsequences left in the nodes after the complete exchange of bins which in turn helps to balance the computational load in each node. The performance of two of these algorithms on an FPS T-40 is given for data of various distributions and is compared to the performance of bitonic sort and hyperquicksort.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127795040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Molecular dynamics simulation on an iPSC of defects in crystals","authors":"P. Flinn","doi":"10.1145/63047.63084","DOIUrl":"https://doi.org/10.1145/63047.63084","url":null,"abstract":"Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. TO copy otherwise, or to republish, requires a fee and/or specfic permission.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133057620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Block-matrix operations using orthogonal trees","authors":"A. Elster, A. Reeves","doi":"10.1145/63047.63115","DOIUrl":"https://doi.org/10.1145/63047.63115","url":null,"abstract":"Hypercube algorithms are presented for distributed block-matrix operations. These algorithms are based entirely on an interconnection scheme which involves two orthogonal sets of binary trees. This switching topology makes use of all hypercube interconnection links in a synchronized manner.\u0000An efficient novel matrix-vector multiplication algorithm based on this technique is described. Also, matrix transpose operations moving just pointers rather than actual data, have been implemented for some applications by taking advantage of the above tree structures. For the cases where actual physical vector and matrix transposes are needed, possible techniques, including extensions of the above scheme, are discussed.\u0000The algorithms support submatrix partitionings of the data, instead of being limited to row and/or column partitionings. This allows efficient use of nodal vector processors as well as shorter interprocessor communication packets. It also produces a favorable data distribution for applications which involve near neighbor operations such as image processing. The algorithms are based on an interprocessor communication paradigm which involves variable length, tagged block data transfers. They have been implemented on an Intel iPSC hypercube system with the support of the Hypercube Library developed at the Christian Michelsen Institute.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124579529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}