{"title":"A concurrent neural network algorithm for the traveling salesman problem","authors":"N. Toomarian","doi":"10.1145/63047.63105","DOIUrl":"https://doi.org/10.1145/63047.63105","url":null,"abstract":"A binary neuromorphic data structure is used to encode the N — city Traveling Salesman Problem (TSP). In this representation the computational complexity, in terms of number of neurons, is reduced from Hopfield and Tank's &Ogr;(N2) to &Ogr;(N log2 N). A continuous synchronous neural network algorithm in conjunction with the LaGrange multiplier, is used to solve the problem. The algorithm has been implemented on the NCUBE hypercube multiprocessor. This algorithm converges faster and has a higher probability to reach a valid tour than previously available results.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134497902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-scale concurrent computing in artificial intelligence research","authors":"L. Gasser","doi":"10.1145/63047.63089","DOIUrl":"https://doi.org/10.1145/63047.63089","url":null,"abstract":"Research in AI is slowly maturing, and body of accepted techniques for reasoning and for representing knowledge in simple, circumscribed domains now exists. But with the maturity of AI has come a growing awareness of the severe limitations of current techniques for constructing more complex problem solving or interpretation systems. We currently have inadequate means to gather, represent, store, organize, access, and manipulate the huge collections of knowledge required for complex problem solving. Existing systems can't reconfigure themselves in changing situations, nor can they incrementally adjust to new knowledge or new techniques. Large scale problem solvers (e.g. factory automation systems) cannot in principle completely model the world in which they exist, and must face problems of inconsistency, asynchrony, control and geographic distribution, etc. — they will have to work in “open systems.”\u0000Many solutions under consideration rely on concurrent computation, using either very fine grained “connectionist,” “neural computing” or “data parallel” approaches, or using larger grain collections of “objects,” “agents,” or “problem solving nodes” — techniques collectively termed “Distributed AI.” In this paper we characterize the needs for concurrency and parallelism in AI, with special attention to building medium to large grain adaptive problem solvers in open systems. In these systems the overriding concern is organizing the problem solving system's behavior — the “coordination problem.” Conventional distributed computing and parallel algorithms approaches allow a programmer to solve the coordination problem, and provide language constructs and concurrency control mechanisms with which a program can enact his solution. In Distributed AI, we attempt to improve adaptability by designing problem solvers which can both solve the coordination problem and enact the solution themselves.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116768341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid hypercube algorithm for the symmetric tridiagonal eigenvalue problem","authors":"J. A. Jackson, L. Liebrock, L. Ziegler","doi":"10.1145/63047.63113","DOIUrl":"https://doi.org/10.1145/63047.63113","url":null,"abstract":"Two versions of an algorithm for finding the eigenvalues of symmetric, tridiagonal matrices are described. They are based on the use of the Sturm sequences and the bisection algorithm. The algorithms were implemented on the FPS T-Series. Some speedup factor results are presented.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117081526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A universal concurrent algorithm for plasma particle-in-cell simulation codes","authors":"P. Liewer, V. Decyk, J. Dawson, G. Fox","doi":"10.1145/63047.63063","DOIUrl":"https://doi.org/10.1145/63047.63063","url":null,"abstract":"We have developed a new algorithm for implementation of plasma particle-in-cell (PIC) simulation codes on concurrent processors. This algorithm, termed the universal concurrent PIC algorithm (UC-PIC), has been utilized in a one-dimensional electrostatic PIC code on the JPL Mark III Hypercube parallel computer. To decompose the problem using the UC-PIC algorithm, the physical domain of the simulation is divided into sub-domains, equal in number to the number of processors, such that all sub-domains have roughly equal numbers of particles. For problems with non-uniform particle densities, these sub-domains will be of unequal physical size. Each processor is assigned, a sub-domain, with nearest neighbor sub-domains assigned to nearest neighbor processors. Using this algorithm in the Mark III PIC code, the increase in speed in going from 1 to 32 processors for the dominant portion of code (push time, defined below) was 29, yielding a parallel efficiency of 90%. Although implemented on a hypercube concurrent computer, this algorithm should be also be efficient for PIC codes on other parallel architectures and on sequential computers where part of the data resides in external memory.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116780463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Distributed orthogonal factorization","authors":"A. Pothen, P. Raghavan","doi":"10.1145/63047.63122","DOIUrl":"https://doi.org/10.1145/63047.63122","url":null,"abstract":"We describe several algorithms for computing the orthogonal factorization on distributed memory multiprocessors. One of the algorithms is based on Givens rotations, two others employ column Householder transformations but with different communication schemes: broadcast and pipelined ring. A fourth algorithm is a hybrid; it uses Househlolder transformations and Givens rotations in separate phases. We present expressions for the arithmetic and communication complexity of each algorithm. The algorithms were implemented on an iPSC-286 and the observed times agree well with our analyses.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114855890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pattern recognition by neural network model on hypercubes","authors":"W. Furmanski","doi":"10.1145/63047.63055","DOIUrl":"https://doi.org/10.1145/63047.63055","url":null,"abstract":"The objective of this work is to study the performance characteristics of the back-propagation model for pattern recognition. Specifically, the test case of recognition of Chinese characters is studied on an ELXSI-6400 and MARK III hypercube. Preliminary results indicate that local spatial decomposition of characters in the training set leads to simple parallel implementation of the neural net model on hypercubes, and also serves as an effective pre-processor which provides high quality of recognition and good efficiency.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123928846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The prime factor non-binary discrete Fourier transform and use of Crystal_Router as a general purpose communication routine","authors":"G. Aloisio, Nicola Veneziani, Jai Sam Kim, G. Fox","doi":"10.1145/63047.63087","DOIUrl":"https://doi.org/10.1145/63047.63087","url":null,"abstract":"We have implemented one of the Fast Fourier Transform algorithms, the Prime Factor algorithm (PFA), on the hypercube. On sequential computers, the PFA and other discrete Fourier transforms (DFT) such as the Winograd algorithm (WFA) are known to be very efficient. However, both algorithms require full data shuffling and are thus challenging to any distributed memory parallel computers. We use a concurrent communication algorithm, called the Crystal_Router for communicating shuffled data. We will show that the speed gained in reduced arithmetic compared to binary FFT is sufficient to overcome the extra communication requirement up to a certain number of processors. Beyond this point the standard Cooley-Tukey FFT algorithm has the best performance. We comment briefly on the application of the DFT to signal processing in synthetic aperture radar (SAR).","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125362788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vectorized dissection on the hypercube","authors":"T-H. Olesen, J. Petersen","doi":"10.1145/63047.63135","DOIUrl":"https://doi.org/10.1145/63047.63135","url":null,"abstract":"Dissection ordering is used with Gaussian elimination on a Hypercube parallel processor with vector hardware to solve matrices arising from finite-difference and finite-element discretizations of 2-D elliptic partial differential equations. These problems can be put into a matrix-vector form, Ax = f, where the matrix A takes the place of the differential operator, x is the solution vector, and f is the source vector. The domain is divided among the nodes with neighboring subdomains sharing a strip called a separator. Each processor is given its own part of the source vector and computes its own part of the stiffness matrix, A.\u0000The elimination starts out in parallel; communication is only needed after most of the elimination is finished when the edges need to be eliminated. Back substitution is initially done on the domain edges, and then totally in parallel without communication on each node. The Hypercube code involved was optimized to work with vector hardware. Example problems and timings are given with comparisons to nonvector runs.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123188648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Flux-corrected transport algorithm on the NCUBE hypercube","authors":"D. Walker, G. Fox, G. Montry","doi":"10.1145/63047.63065","DOIUrl":"https://doi.org/10.1145/63047.63065","url":null,"abstract":"This work describes the implementation of a finite-difference algorithm, incorporating the flux-corrected transport technique, on the NCUBE hypercube. The algorithm is used to study two-dimensional, convectively-dominated fluid flows, and as a sample problem the onset and growth of the Kelvin-Helmholtz instability is investigated. Timing results are presented for a number of different sized problems on hypercubes of dimension up to 9. These results are interpreted by means of a simple performance model. The extension of the algorithm to the three-dimensional case is also discussed.","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127592145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QED on the connection machine","authors":"C. Baillie, S. Johnsson, Luis F. Ortiz, G. Pawley","doi":"10.1145/63047.63082","DOIUrl":"https://doi.org/10.1145/63047.63082","url":null,"abstract":"Physicists believe that the world is described in terms of gauge theories. A popular technique for investigating these theories is to discretize them onto a lattice and simulate numerically by a computer, yielding so-called lattice gauge theory. Such computations require at least 1014 floating-point operations, necessitating the use of advanced architecture supercomputers such as the Connection Machine made by Thinking Machines Corporation. Currently the most important gauge theory to be solved is that describing the sub-nuclear world of high energy physics: Quantum Chromo-dynamics (QCD). The simplest example of a gauge theory is Quantum Electro-dynamics (QED), the theory which describes the interaction of electrons and photons. Simulation of QCD requires computer software very similar to that for the simpler QED problem. Our current QED code achieves a computational rate of 1.6 million lattice site updates per second for a Monte Carlo algorithm, and 7.4 million site updates per second for a microcanonical algorithm. The estimated performance for a Monte Carlo QCD code is 200,000 site updates per second (or 5.6 Gflops/sec).","PeriodicalId":299435,"journal":{"name":"Conference on Hypercube Concurrent Computers and Applications","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1989-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114321845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}