{"title":"Intrinsically parallel multiscale algorithms for hypercubes","authors":"P. Frederickson, O. McBryan","doi":"10.1145/63047.63131","DOIUrl":"https://doi.org/10.1145/63047.63131","url":null,"abstract":"Most algorithms implemented on parallel computers have been optimal serial algorithms, slightly modified or parallelized. An exciting possibility is the search for intrinsically parallel algorithms. These are algorithms which do not have a sensible serial equivalent — any serial equivalent is so inefficient as to be of little use.\u0000We describe a multiscale algorithm for the solution of PDE systems that is designed specifically for massively parallel supercomputers. Unlike conventional multigrid algorithms, the new algorithm utilizes the same number of processors at all times. Convergence rates are much faster than for standard multigrid methods — the solution error decreases by up to three digits per iteration. The basic idea is to solve many coarse scale problems simultaneously, combining the results in an optimal way to provide an improved fine scale solution.\u0000On massively parallel machines the improved convergence rate is attained at no extra computational cost since processors that would otherwise be sitting idle are utilized to provide the better convergence. Furthermore the algorithm is ideally suited to SIMD computers as well as MIMD computers. On serial machines the algorithm is much slower than standard multigrid because of the extra time spent on multiple coarse scales, though in certain cases the improved convergence rate may justify this — primarily in cases where other methods do not converge. The algorithm provides an extremely fast solution of various standard elliptic equations on machines such as the 65,536 processor Connection Machine, and uses only &Ogr; (log(N)) parallel machine instructions to solve such equations. The discovery of this algorithm was motivated entirely by new hardware. It was a surprise to the authors to find that developments in computer architecture might lead to new mathematics. Undoubtedly further intrinsically parallel algorithms await discovery.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132802207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-grain pipelining on hypercube multiprocessors","authors":"C. King, L.M. Ni","doi":"10.1145/63047.63119","DOIUrl":"https://doi.org/10.1145/63047.63119","url":null,"abstract":"A new paradigm, called large-grain pipelining, for developing efficient parallel algorithms on distributed-memory multiprocessors, e.g., hypercube machines, is introduced. Large-grain pipelining attempts to maximize the degree of overlapping and minimize the effect of communication overhead in a multiprocessor system through macro-pipelining between the nodes. Algorithms developed through large-grain pipelining to perform matrix multiplication are presented. To model the pipelined computations, an analytic model is introduced, which takes into account both underlying architecture and algorithm behavior. Through the analytic model, important design parameters, such as data partition sizes, can be determined. Experiments were conducted on a 64-node NCUBE multiprocessor. The measured results match closely with the analyzed results, which establishes the analytic model as an integral part of algorithm design. Comparison with an algorithm which does not use large-grain pipelining also shows that large-grain pipelining is an efficient scheme for achieving a greater parallelism.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121843388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Bower, M. Nelson, M. Wilson, G. Fox, W. Furmanski
{"title":"Piriform (Olfactory) cortex model on the hypercube","authors":"J. Bower, M. Nelson, M. Wilson, G. Fox, W. Furmanski","doi":"10.1145/63047.63052","DOIUrl":"https://doi.org/10.1145/63047.63052","url":null,"abstract":"We present a concurrent hypercube implementation of a neurophysiological model for the piriform (olfactory) cortex.\u0000The project was undertaken as the first step towards constructing a general neural network simulator on the hypercube, suitable both for applied and biological nets.\u0000The method presented here is expected to be useful for a class of complex and computationally expensive network models with long range connectivity and non-homogeneous activity patterns. The hypercube communication for the fully interconnected case is efficiently realized by the fold algorithm, constructed previously for problems in concurrent matrix algebra whereas the patchy activity is successfully load balanced by the scattered decomposition. We discuss also briefly other communication strategies, relevant for sparse and variable connectivities.\u0000Sample numerical results presented here were derived on the NCUBE hypercube at Caltech.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130216153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Gerasoulis, Nikolaos Missirlis, I. Nelken, R. Peskin
{"title":"Implementing Gauss Jordan on a hypercube multicomputer","authors":"A. Gerasoulis, Nikolaos Missirlis, I. Nelken, R. Peskin","doi":"10.1145/63047.63117","DOIUrl":"https://doi.org/10.1145/63047.63117","url":null,"abstract":"We consider the solution of dense algebraic systems on the NCUBE hypercube via the Gauss Jordan method. Advanced loop interchange techniques are used to determine the appropriate algorithm for MIMD architectures. For a computer with p = n processors, we show that Gauss Jordan is competitive to Gaussian elimination when pivoting is not used. We experiment with three mappings of columns to processors: block, wrap and reflection. We demonstrate that load balancing the processors results in a considerable reduction of execution time.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127168297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Process and workload migration for a parallel branch-and-bound algorithm on a hypercube multicomputer","authors":"K. Schwan, J. Gawkowski, Sen Blake","doi":"10.1145/63047.63110","DOIUrl":"https://doi.org/10.1145/63047.63110","url":null,"abstract":"This paper describes the design and experimental evaluation of a novel parallel implementation of a branch-and-bound algorithm for solving the Traveling Salesperson Problem on a 32 node Intel hypercube. Issues studied experimentally are trade-offs in speed, memory, and communication costs as well as the effects of workload balancing and node utilization on speedup.\u0000Since the actual distribution of work among the parallel tasks of the TSP application cannot be predicted in advance, strategies and tradeoffs regarding the migration of processes from heavily loaded processors or the migration of work from heavily loaded processes can be studied. Toward this end, we have implemented operating system constructs for work and for process migration as extensions to the Intel iPSC hypercube's operating system. Furthermore, operating system support for the rapid sharing of intermediate values of the global objective function being optimized (i.e. 'tour' values in TSP) are provided.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127428482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finding eigenvalues and eigenvectors of unsymmetric matrices using a hypercube multiprocessor","authors":"A. Geist, R. Ward, G. J. Davis, R. Funderlic","doi":"10.1145/63047.63118","DOIUrl":"https://doi.org/10.1145/63047.63118","url":null,"abstract":"Distributed-memory algorithms for finding the eigenvalues and eigenvectors of a dense unsymmetric matrix are given. While several parallel algorithms have been developed for symmetric systems, little work has been done on the unsymmetric case. Our parallel implementation proceeds in three major steps: reduction of the original matrix to Hessenberg form, application of the implicit double-shift QR algorithm to compute the eigenvalues, and back transformations to compute the eigenvectors. Several modifications to our parallel QR algorithm, including ring communication and pipelining, are discussed and compared. Results and timings are given.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130831714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Best-first branch-and bound on a hypercube","authors":"E. Felten","doi":"10.1145/63047.63107","DOIUrl":"https://doi.org/10.1145/63047.63107","url":null,"abstract":"The branch-and-bound technique is a common method for finding exact solutions to difficult problems in combinatorial optimization. This paper will discuss issues surrounding implementation of a particular branch-and-bound algorithm for the traveling-salesman problem on a hypercube multi-computer.\u0000The natural parallel algorithm is based on a number of asynchronous processes which interact through a globally shared list of unfinished work. In a distributed-memory environment we must find a non-centralized version of this shared data structure. In addition, detecting termination of the computation is tricky; an algorithm will be presented which ensures proper termination. Finally, issues affecting performance will be discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114213287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing a distributed combat simulation on the Time Warp operating system","authors":"F. Wieland, L. Hawley, A. Feinberg","doi":"10.1145/63047.63080","DOIUrl":"https://doi.org/10.1145/63047.63080","url":null,"abstract":"Utilizing the Time Warp Operating System, the CTLS project at JPL has produced a distributed combat simulation called STB-87 and measured its performance on the JPL Mark III Hypercube. By applying the spiral model of software development, the CTLS project will produce a series of software test beds, to culminate in the completion of a working prototype theater level simulation three to five years hence. STB-87, the first software test bed, is a ground-based combat simulation decomposed into objects which communicate via time-stamped messages. The use of incremental object-based design, coding, and testing has been helpful when developing a parallel simulation. The performance measurements show that, with the appropriate choice of object granularity, STB-87 is able to achieve a speedup factor of 12 running on a 32-node Mark III Hypercube.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"142 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122667976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Finite difference time domain solution of electromagnetic scattering on the hypercube","authors":"Ruel H. Calalo, J. Lyons, W. Imbriale","doi":"10.1145/63047.63062","DOIUrl":"https://doi.org/10.1145/63047.63062","url":null,"abstract":"Electromagnetic fields interacting with a dielectric or conducting structure produce scattered electromagnetic fields. To model the fields produced by complicated, volumetric structures, the finite difference time domain (FDTD) method employs an iterative solution to Maxwell's time dependent curl equations. Implementations of the FDTD method intensively use memory and perform numerous calculations per time step iteration. We implemented an FDTD code on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. This code allows us to solve problems requiring as many as 2,048,000 unit cells on a 32 node Hypercube. For smaller problems, the code produces solutions in a fraction of the time to solve the same problems on sequential computers.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123310240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Shift-register sequence random number generators on the hypercube conurrent computers","authors":"T. Chiu","doi":"10.1145/63047.63098","DOIUrl":"https://doi.org/10.1145/63047.63098","url":null,"abstract":"We discuss the design of a class of shift-register sequence random number generators for the MIMD parallel computers, and particularly for the hypercube concurrent computers. The simplest implementation is to have each processor generating its own sequence provided that the initial seeds are linearly independent. We generate these initial seeds by using distinct linear congruential generators and finally bit-by-bit-exclusive-or with the system time in microseconds. Our shift-register sequence random number generators are coded in C and run under the CUBIX. The statistical tests are performed on each sequence generated by every single processor as well as on the combined sequence produced by all processors. The tests include chi- square, Kolmogorov-Smirnov, auto-correlation, runlength and n-tuple distribution tests. A statistical test has been devised for testing the sequences of random numbers generated by a MIMD parallel computer. Our test results indicate that our generators do provide independent sequences of random numbers with extremely long periods.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"721 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121998649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}