{"title":"Distributed orthogonal factorization","authors":"A. Pothen, P. Raghavan","doi":"10.1145/63047.63122","DOIUrl":"https://doi.org/10.1145/63047.63122","url":null,"abstract":"We describe several algorithms for computing the orthogonal factorization on distributed memory multiprocessors. One of the algorithms is based on Givens rotations, two others employ column Householder transformations but with different communication schemes: broadcast and pipelined ring. A fourth algorithm is a hybrid; it uses Househlolder transformations and Givens rotations in separate phases. We present expressions for the arithmetic and communication complexity of each algorithm. The algorithms were implemented on an iPSC-286 and the observed times agree well with our analyses.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114855890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale concurrent computing in artificial intelligence research","authors":"L. Gasser","doi":"10.1145/63047.63089","DOIUrl":"https://doi.org/10.1145/63047.63089","url":null,"abstract":"Research in AI is slowly maturing, and body of accepted techniques for reasoning and for representing knowledge in simple, circumscribed domains now exists. But with the maturity of AI has come a growing awareness of the severe limitations of current techniques for constructing more complex problem solving or interpretation systems. We currently have inadequate means to gather, represent, store, organize, access, and manipulate the huge collections of knowledge required for complex problem solving. Existing systems can't reconfigure themselves in changing situations, nor can they incrementally adjust to new knowledge or new techniques. Large scale problem solvers (e.g. factory automation systems) cannot in principle completely model the world in which they exist, and must face problems of inconsistency, asynchrony, control and geographic distribution, etc. — they will have to work in “open systems.”\u0000Many solutions under consideration rely on concurrent computation, using either very fine grained “connectionist,” “neural computing” or “data parallel” approaches, or using larger grain collections of “objects,” “agents,” or “problem solving nodes” — techniques collectively termed “Distributed AI.” In this paper we characterize the needs for concurrency and parallelism in AI, with special attention to building medium to large grain adaptive problem solvers in open systems. In these systems the overriding concern is organizing the problem solving system's behavior — the “coordination problem.” Conventional distributed computing and parallel algorithms approaches allow a programmer to solve the coordination problem, and provide language constructs and concurrency control mechanisms with which a program can enact his solution. In Distributed AI, we attempt to improve adaptability by designing problem solvers which can both solve the coordination problem and enact the solution themselves.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid hypercube algorithm for the symmetric tridiagonal eigenvalue problem","authors":"J. A. Jackson, L. Liebrock, L. Ziegler","doi":"10.1145/63047.63113","DOIUrl":"https://doi.org/10.1145/63047.63113","url":null,"abstract":"Two versions of an algorithm for finding the eigenvalues of symmetric, tridiagonal matrices are described. They are based on the use of the Sturm sequences and the bisection algorithm. The algorithms were implemented on the FPS T-Series. Some speedup factor results are presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117081526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vectorized dissection on the hypercube","authors":"T-H. Olesen, J. Petersen","doi":"10.1145/63047.63135","DOIUrl":"https://doi.org/10.1145/63047.63135","url":null,"abstract":"Dissection ordering is used with Gaussian elimination on a Hypercube parallel processor with vector hardware to solve matrices arising from finite-difference and finite-element discretizations of 2-D elliptic partial differential equations. These problems can be put into a matrix-vector form, Ax = f, where the matrix A takes the place of the differential operator, x is the solution vector, and f is the source vector. The domain is divided among the nodes with neighboring subdomains sharing a strip called a separator. Each processor is given its own part of the source vector and computes its own part of the stiffness matrix, A.\u0000The elimination starts out in parallel; communication is only needed after most of the elimination is finished when the edges need to be eliminated. Back substitution is initially done on the domain edges, and then totally in parallel without communication on each node. The Hypercube code involved was optimized to work with vector hardware. Example problems and timings are given with comparisons to nonvector runs.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123188648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pattern recognition by neural network model on hypercubes","authors":"W. Furmanski","doi":"10.1145/63047.63055","DOIUrl":"https://doi.org/10.1145/63047.63055","url":null,"abstract":"The objective of this work is to study the performance characteristics of the back-propagation model for pattern recognition. Specifically, the test case of recognition of Chinese characters is studied on an ELXSI-6400 and MARK III hypercube. Preliminary results indicate that local spatial decomposition of characters in the training set leads to simple parallel implementation of the neural net model on hypercubes, and also serves as an effective pre-processor which provides high quality of recognition and good efficiency.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The prime factor non-binary discrete Fourier transform and use of Crystal_Router as a general purpose communication routine","authors":"G. Aloisio, Nicola Veneziani, Jai Sam Kim, G. Fox","doi":"10.1145/63047.63087","DOIUrl":"https://doi.org/10.1145/63047.63087","url":null,"abstract":"We have implemented one of the Fast Fourier Transform algorithms, the Prime Factor algorithm (PFA), on the hypercube. On sequential computers, the PFA and other discrete Fourier transforms (DFT) such as the Winograd algorithm (WFA) are known to be very efficient. However, both algorithms require full data shuffling and are thus challenging to any distributed memory parallel computers. We use a concurrent communication algorithm, called the Crystal_Router for communicating shuffled data. We will show that the speed gained in reduced arithmetic compared to binary FFT is sufficient to overcome the extra communication requirement up to a certain number of processors. Beyond this point the standard Cooley-Tukey FFT algorithm has the best performance. We comment briefly on the application of the DFT to signal processing in synthetic aperture radar (SAR).","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125362788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Block-matrix operations using orthogonal trees","authors":"A. Elster, A. Reeves","doi":"10.1145/63047.63115","DOIUrl":"https://doi.org/10.1145/63047.63115","url":null,"abstract":"Hypercube algorithms are presented for distributed block-matrix operations. These algorithms are based entirely on an interconnection scheme which involves two orthogonal sets of binary trees. This switching topology makes use of all hypercube interconnection links in a synchronized manner.\u0000An efficient novel matrix-vector multiplication algorithm based on this technique is described. Also, matrix transpose operations moving just pointers rather than actual data, have been implemented for some applications by taking advantage of the above tree structures. For the cases where actual physical vector and matrix transposes are needed, possible techniques, including extensions of the above scheme, are discussed.\u0000The algorithms support submatrix partitionings of the data, instead of being limited to row and/or column partitionings. This allows efficient use of nodal vector processors as well as shorter interprocessor communication packets. It also produces a favorable data distribution for applications which involve near neighbor operations such as image processing. The algorithms are based on an interprocessor communication paradigm which involves variable length, tagged block data transfers. They have been implemented on an Intel iPSC hypercube system with the support of the Hypercube Library developed at the Christian Michelsen Institute.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124579529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Flux-corrected transport algorithm on the NCUBE hypercube","authors":"D. Walker, G. Fox, G. Montry","doi":"10.1145/63047.63065","DOIUrl":"https://doi.org/10.1145/63047.63065","url":null,"abstract":"This work describes the implementation of a finite-difference algorithm, incorporating the flux-corrected transport technique, on the NCUBE hypercube. The algorithm is used to study two-dimensional, convectively-dominated fluid flows, and as a sample problem the onset and growth of the Kelvin-Helmholtz instability is investigated. Timing results are presented for a number of different sized problems on hypercubes of dimension up to 9. These results are interpreted by means of a simple performance model. The extension of the algorithm to the three-dimensional case is also discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127592145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A universal concurrent algorithm for plasma particle-in-cell simulation codes","authors":"P. Liewer, V. Decyk, J. Dawson, G. Fox","doi":"10.1145/63047.63063","DOIUrl":"https://doi.org/10.1145/63047.63063","url":null,"abstract":"We have developed a new algorithm for implementation of plasma particle-in-cell (PIC) simulation codes on concurrent processors. This algorithm, termed the universal concurrent PIC algorithm (UC-PIC), has been utilized in a one-dimensional electrostatic PIC code on the JPL Mark III Hypercube parallel computer. To decompose the problem using the UC-PIC algorithm, the physical domain of the simulation is divided into sub-domains, equal in number to the number of processors, such that all sub-domains have roughly equal numbers of particles. For problems with non-uniform particle densities, these sub-domains will be of unequal physical size. Each processor is assigned, a sub-domain, with nearest neighbor sub-domains assigned to nearest neighbor processors. Using this algorithm in the Mark III PIC code, the increase in speed in going from 1 to 32 processors for the dominant portion of code (push time, defined below) was 29, yielding a parallel efficiency of 90%. Although implemented on a hypercube concurrent computer, this algorithm should be also be efficient for PIC codes on other parallel architectures and on sequential computers where part of the data resides in external memory.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116780463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A concurrent neural network algorithm for the traveling salesman problem","authors":"N. Toomarian","doi":"10.1145/63047.63105","DOIUrl":"https://doi.org/10.1145/63047.63105","url":null,"abstract":"A binary neuromorphic data structure is used to encode the N — city Traveling Salesman Problem (TSP). In this representation the computational complexity, in terms of number of neurons, is reduced from Hopfield and Tank's &Ogr;(N2) to &Ogr;(N log2 N). A continuous synchronous neural network algorithm in conjunction with the LaGrange multiplier, is used to solve the problem. The algorithm has been implemented on the NCUBE hypercube multiprocessor. This algorithm converges faster and has a higher probability to reach a valid tour than previously available results.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134497902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}