{"title":"Task allocation onto a hypercube by recursive mincut bipartitioning","authors":"F. Erçal, J. Ramanujam, P. Sadayappan","doi":"10.1145/62297.62323","DOIUrl":"https://doi.org/10.1145/62297.62323","url":null,"abstract":"An efficient recursive task allocation scheme, based on the Kernighan-Lin mincut bisection heuristic, is proposed for the effective mapping of tasks of a parallel program onto a hypercube parallel computer. It is evaluated by comparison with an adaptive, scaled simulated annealing method. The recursive allocation scheme is shown to be effective on a number of large test task graphs - its solution quality is nearly as good as that produced by simulated annealing, and its computation time is several orders of magnitude less.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114223803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Portable programming within a message-passing model: the FFT as an example","authors":"D. Walker","doi":"10.1145/63047.63100","DOIUrl":"https://doi.org/10.1145/63047.63100","url":null,"abstract":"This paper describes a portable programming environment for MIMD concurrent processors based on an object-oriented, message-passing paradigm. The basis of this environment is the Virtual Machine Loosely Synchronous Communication System (VMLSCS) which is designed to be used for loosely synchronous problems. VMLSCS is structured to make efficient use of hierarchical memory, and permits communication and calculation to be overlapped on certain concurrent processors. As an example, the use of VMLSCS in performing both one-dimensional and multi-dimensional fast Fourier transforms (FFTs) on concurrent multiprocessors is described. It is shown that all necessary interprocessor communication can be performed by a single routine, vm_index. Thus the construction of a portable concurrent FFT rests on the implementation of vm_index on the target machines. In the multi-dimensional algorithm a strip decomposition is applied to each of the directions in turn so that each of the FFTs performed in a particular direction are done in one processor. This allows fast sequential one-dimensional FFTs to be exploited. The implementation of vm_index on both homogeneous and inhomogeneous hypercubes, and shared memory multiprocessors is discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117340734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal matrix algorithms on homogeneous hypercubes","authors":"G. Fox, W. Furmanski, D. Walker","doi":"10.1145/63047.63125","DOIUrl":"https://doi.org/10.1145/63047.63125","url":null,"abstract":"This paper describes a set of concurrent algorithms for matrix algebra, based on a library of collective communication routines for the hypercube. We show how a systematic application of scattering reduces load imbalance. A number of examples are considered (Gaussian elimination, Gauss-Jordan matrix inversion, the power method for eigenvectors, and tridiagonalisation by Householder's method), and the concurrent efficiencies are discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123320469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An interactive system for seismic velocity analysis","authors":"C. Addison, J. M. Cook, L. R. Hagen","doi":"10.1145/63047.63067","DOIUrl":"https://doi.org/10.1145/63047.63067","url":null,"abstract":"Seismic data processing is a time-consuming operation. Two main reasons for this are the large amounts of data, which have to be put in and taken out of the computer regularly, and the need for intervention by an expert at many intermediate stages. A subsidiary cause is the large amount of computation. The recent appearance of multiprocessor computers has created the opportunity to provide an interactive system to ease and speed-up the seismic data processing cycle.\u0000This paper describes the development of an initial interactive system for the velocity analysis and NMO/stacking stage of seismic processing. Computational power and storage are supplied by two 32-node Intel iPSC/1 Hypercubes, both with 16 memory nodes and 16 vector nodes. Each hypercube has 96 Mbytes of memory and a top processing speed of over 100 Mflops (32 bit). A SUN-3 Workstation is used to display intermediate results and to enable the expert to direct the processing more efficiently.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114274633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Use of the hypercube for symbolic quantum chromodynamics","authors":"A. Kolawa, G. Fox","doi":"10.1145/63047.63097","DOIUrl":"https://doi.org/10.1145/63047.63097","url":null,"abstract":"A new numerical approach by Furmanski and Kolawa to quantum chromodynamics is based on diagonalizing the underlying Hamiltonian. This method involves the generation of states by repeated action of a potential operator. This symbolic calculation is dominated by the time it takes to search the database of existing states to verify if a generated state is identical to one previously found. We implement this algorithm on the Caltech/JPL Mark II hypercube and analyze its performance of both a simple database search and one optimized for this application. We show that the hypercube performance can be modelled in a fashion similar to conventional numerical (loosely synchronous) applications.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129995190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design and implementation of a concurrent image processing workstation based on the Mark III hypercube","authors":"S. Groom, M. Lee, A. Mazer, W. Williams","doi":"10.1145/63047.63086","DOIUrl":"https://doi.org/10.1145/63047.63086","url":null,"abstract":"Various image processing algorithms have been implemented on the hypercube architecture and many success stories have been reported. However, the traditional approach to programming the hypercube has been to write programs which perform ;I single operation or a fixed set of operations upon data items. This approach has several drawbacks when considered for use in an interactive computing environment. First, it is difficult to process data with a sequence of sim;ple programs in the Mark III Hypercube because the Mark III software does not support sharing of data between successive programs. This means that data must be reloaded into the cube for each individual program. It also implies that programs should be fairly large and complete, to minimize the repeated downloading of large data items for multiple programs. However, the entire program must be able to fit within the hypercube node memory, which limits what a program can do by putting a restriction on its size. Furtbermore, large programs limit the amount of memory available for data, which must also be present in memory if the communications overhead is to be effectively reduced. The development of an interactive image processing workstation based on the: Mark III Hypercube requires satisfactory solutions to these and other problems.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"44 9-10","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132286906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hypercube implementation of the simplex algorithm","authors":"C. Stunkel, D. Reed","doi":"10.1145/63047.63104","DOIUrl":"https://doi.org/10.1145/63047.63104","url":null,"abstract":"Large, sparse, linear systems of equations arise frequently when constructing mathematical models of natural phenomena. Most often, these linear systems are fully constrained and can be solved via direct or iterative techniques. However, one important problem class requires solutions to underconstrained linear systems that maximize some objective function. These linear optimization problems are natural formulations of many business plans and often contain hundreds of equations with thousands of variables. Historically, linear optimization problems have been solved via the simplex method. Despite the excellent performance of the simplex method, the size of the optimization problems and the frequency of their solution make linear optimization a computationally taxing endeavor. This paper examines the performance of parallel variants of the simplex algorithm on the Intel iPSC, a message-based parallel system. Linear optimization test data are drawn from commercial sources and represent realistic problems. Analysis shows that the speedup obtained is sensitive to both the structure of the underlying data and the data partitioning.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115405866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite element solution of thermal convection on a hypercube concurrent computer","authors":"M. Gurnis, A. Raefsky, G. Lyzenga, B. Hager","doi":"10.1145/63047.63070","DOIUrl":"https://doi.org/10.1145/63047.63070","url":null,"abstract":"Numerical solutions to thermal convection flow problems \u0000are vital to many scientific and engineering problems. \u0000One fundamental geophysical problem is the thermal convection \u0000responsible for continental drift and sea floor \u0000spreading. The earth's interior undergoes slow creeping \u0000flow (~cm/yr) in response to the buoyancy forces generated \u0000by temperature variations caused by the decay of \u0000radioactive elements and secular cooling. Convection in \u0000the earth's mantle, the 3000 km thick solid layer between \u0000the crust and core, is difficult to model for three reasons: \u0000(1) Complex rheology -- the effective viscosity depends \u0000exponentially on temperature, on pressure (or depth) and \u0000on the deviatoric stress; (2) the buoyancy forces driving \u0000the flow occur in boundary layers thin in comparison to the \u0000total depth; and (3) spherical geometry -- the flow in the \u0000interior is fully three dimensional. Because of these many \u0000difficulties, accurate and realistic simulations of this process \u0000easily overwhelm current computer speed and memory \u0000(including the Cray XMP and Cray 2) and only simplified \u0000problems have been attempted [e.g. Christensen and \u0000Yuen, 1984; Gurnis, 1988; Jarvis and Peltier, 1982]. \u0000 \u0000As a start in overcoming these difficulties, a number of \u0000finite element formulations have been explored on hypercube \u0000concurrent computers. Although two coupled equations \u0000are required to solve this problem (the momentum \u0000or Stokes equation and the energy or advection-diffusion \u0000equation), we will concentrate our efforts on the solution \u0000to the latter equation in this paper. Solution of the former \u0000equation is discussed elsewhere [Lyzenga, et al, 1988]. \u0000We will demonstrate that linear speedups and efficiencies \u0000of 99 percent are achieved for sufficiently large problems.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116003706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Logic fault simulation on a vector hypercube multiprocessor","authors":"F. Özgüner, C. Aykanat, O. Khalid","doi":"10.1145/63047.63064","DOIUrl":"https://doi.org/10.1145/63047.63064","url":null,"abstract":"Fault simulation is the process of simulating the response of a logic circuit to input patterns in the presence of all possible single faults and is an essential part of test generation for VLSI circuits. Parallelization of the deductive and parallel simulation methods, on a hypercube multiprocessor and vectorization of the parallel simulation method are described. Experimental results are presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124027482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation and performance analysis of parallel assignment algorithms on a hypercube computer","authors":"BarryK. Carpenter, IV NathanielJ.Davis","doi":"10.1145/63047.63077","DOIUrl":"https://doi.org/10.1145/63047.63077","url":null,"abstract":"The process of effectively coordinating and controlling resources during a military engagement is known as battle management/command, control, and communications (BM/C3). One key task of BM/C3 is allocating weapons to destroy targets. The focus of this research is on developing parallel computation methods to achieve fast and cost effective assignment of weapons to targets. Using the sequential Hungarian method for solving the assignment problem as a basis, this paper presents the development and the relative performance comparison of four parallel assignment methodologies that have been implemented on the Intel iPSC hypercube computer. The first three approaches are approximations to the optimal assignment solution. The advantage to these is that they are computationally fast and have proven to generate assignments that are very close the optimal assignment in terms of cost. The fourth approach is a parallel implementation of the Hungarian algorithm, where certain subtasks are performed in parallel. This approach produces an optimal assignment as compared to the sub-optimal assignments that result from the first three approaches. The relative performance of the four approaches is compared by varying the number of weapons and targets, the number of processors used, and the size of the problem partitions.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127225418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}