{"title":"Distributed particle based fluid flow simulation","authors":"T. Gilman, T. Huntsberger, P. Sharma","doi":"10.1109/DMCC.1991.633164","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633164","url":null,"abstract":"Many attempts have been made t o simulate the motion of non-rigid objects. While there have been many successes in this area, the animation of fluids is still a relatively unconquered frontier. This paper describes a distributed model for fluid flow study based on behavioral simulation of atom-like particles. These particles define the size and shape of the fluid. In addiiion, these particles have inertia and respond to attraction, repulsion and gravitation. Unlike previous fluid flow systems, inter-particle forces are explicitly included an the model. A distributed mapping of the particle database similar to recent load-balanced PIC studies [5, 61 allows large numbers of particles to be included in the model. We also present the results of some experimental studies performed on the NCUBE/lD system at the University of South Carolina.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129349447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault Tolerant Communication in the C.NET High Levell Programming Environment","authors":"J. Adamo, J. Benneville, C. Bonello, L. Trejo","doi":"10.1109/DMCC.1991.633361","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633361","url":null,"abstract":"This work is pad of a high-level environment we are developing for a reconfigurable transputer-based machine. It deals with the design of a virtual channel monitor. A protocol is described which, among other things, allows pre-emption of communications and possible failure of the links to be handled consistently.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133065115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Implementing Agenda Parallelism in Production Systems","authors":"G. A. Sawyer, G. Lamont","doi":"10.1109/DMCC.1991.633218","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633218","url":null,"abstract":"Parallel rule execution (agenda parallelism) represents a relatively unexplored method for increasing the execution speed ojr production systems on parallel computer architectures. Agenda parallelism possesses the potential .for increasing the execution speed o f parallel production systems b y an orde,r of magnitude. However, agenda parallelism also introduces a number of significant overhead factors that must be contended with. This paper presents an overview of AFIT’s initial research on agenda parallelism; it includes a discussion ojf the advaniiages and liabilities associated with this decomposition approach based on formal proofs, problem analysis and actual implementation.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124147767","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear Speedup of Winograd's Matrix Multiplication Algorithm Using an Array Processor","authors":"De-Lei Lee, M. A. Aboelaze","doi":"10.1109/DMCC.1991.633203","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633203","url":null,"abstract":"Winogradi’s matrix multiplication algorithm halves the number of multiplication operations required of the conventional 0 ( N 3 ) matrix multiplication algoirithm by slightly increasing the number of addition operations. Such it technique can be computatiorially advantageous when the machine performing the matrix computation takes much more time for multiplication over addition operations. This is overwhelmingly the case in the massively parallel computing paradigm, where each processor is extremely simple by itself and the computing power is obtained by the use of a large number of such processors. In this paper, we describe a parallel version of Winograd’s imatrix multiplication algorithm using an array processor and show how to achieve nearly linear speedup over its sequential counterpart.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"604 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116373773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Parallel Programming Paradigms for Structuring Programs on Distributed Memory Computers","authors":"A. W. Kwan, L. Bic","doi":"10.1109/DMCC.1991.633127","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633127","url":null,"abstract":"Programming paradigms have been advocated as a method of abstraction for viewing parallel algorithms. By viewing such paradigms as a method of algorithm chwijication, we have used paradigms as a technque f i r structuring certain types of algorithm on distributed memory computers, allowing f i r separation of computation and synchronization. The structuring technique assists the parallel programmer with synchronization, allowing the programmer to concentrate more on developing code f i r computatwn. Experiments with the compute-aggregate-broa&ast paradigm indicate that such a structuring technique can be utilized for diflerentprograms, andcan be efficient.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123713165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Parallel-Vector Algorithm for Solving Periodic Tridiagonal Linear Systems of Equations","authors":"T. Taha","doi":"10.1109/DMCC.1991.633307","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633307","url":null,"abstract":"Periodic tridiagonal linear systems of equations typi- cally arise from discretizing second order differential equations with periodic boundary conditions. In this paper a parallel-vector algorithm is introduced to solve such systems. Implementation of the new algorithm is carried out on an Intel iPSC/2 hypercube with vector processor boards attached to each node processor. It is to be noted that t his algorithm can be extended to solve other periodic banded linear systems.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124865387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Parker, T. Cwik, R. Ferraro, P. Liewer, P. Lyster, J. Patterson
{"title":"Helmholtz Finite Elements Performance On Mark III and Intel iPSC/860 Hypercubes","authors":"J. Parker, T. Cwik, R. Ferraro, P. Liewer, P. Lyster, J. Patterson","doi":"10.1109/DMCC.1991.633158","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633158","url":null,"abstract":"The large distributed memory capacities of hypercube computers are exploited by a finite element application which computes the scattered electromagetic field from heterogeneous objects with size large compared to a wavelength. Such problems scale well with hypercube dimension fo r large objects: by using the Recursive Inertial Partitioning algorithm and an iterative solver, the work done by each processor is nearly equal and communication overhead for the system set-up and solution is low. The application has been integrated into a user-friendly eirvironment on a graphics workstation in a local area network including hypercube host machines. Users need never know their solutions are obtained via a parallel computer. Scaling is shown by computing solutions for a series of models which double the number of variables for each increment of hypercube dimension. Timings are compared for the JPLICaltech Mark IIIfp Hypercube and the Intel iPSCI860 hypercube. Acceptable quality of solutions is obtained for object domains of hundreds of square wavelengths and resulting sparse matrix systems with order of 100,000 complex unknowns.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient All-to-All Communication Patterns in Hypercube and Mesh Topologies","authors":"D. Scott","doi":"10.1109/DMCC.1991.633174","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633174","url":null,"abstract":"Some application programs on distributed memory parallel computers occasionally require an \"all-to-all\" communication pattern, where each compute node must send a distinct message to each other compute node. Assuming that each node can send and receive only one message at a t ime, the all-to-all pattern must be implemented as a sequence of phases in which certain nodes send and receive messages. r f there are p compute nodes, then at least p-1 phases are needed to complete the operation. A proof of a schedule achieving this lower bound on a circuit switched hypercube with fuced routing is given. This lower bound cannot be achieved on a 2 dimensional mesh. On an axa mesh, dl4 is shown to be a lower bound and a schedule with this number of phases is given. Whether hypercubes or meshes are better for this algorithm depends on the relative bandwidths of the communication channels.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementing the Perfect ARC2D Benchmark on the BBN TC2000 Parallel Supercomputer","authors":"S. Breit","doi":"10.1109/DMCC.1991.633200","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633200","url":null,"abstract":"The TC.2000 is a MIMD parallel processor wi,th memory that is physically distributed memory, but logically shared. Interprocessor covnmunication, and therefore access to shared memory, is sufficiently fast that most applications can be ported to the TC.2000 without rewriting the code from scratch. This paper shows how this was done for the Perfect ARC'2D benchmark. The code was first restructured by changing the order of subroutine calls so that interprocessor communication would be reduced to the equivalent of three full transposes ofthe data per iteration. The parallel implementation was then completed by inserting shared data declarations and parallel extensions provided by the TC.2000 Fortran language. Thi:F approach was easier to implement than a domain decomposition technique, but requires more interprocessor communication. It is feasible only (because of the TC.2000'~ highspeed interprocessor communications network. References to shared memory take about 25% of the totai execution time for the parallel version of ARC2D. an acceptable amount considering the code did not have to be completely rewritten. High parallel efficiency was obtained using up","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130116534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Software En ineering Aspects of the ProSolver -SES Skyline Solver","authors":"E. Castro-Leon, M. L. Barton, E. Kushner","doi":"10.1109/DMCC.1991.633170","DOIUrl":"https://doi.org/10.1109/DMCC.1991.633170","url":null,"abstract":"The Prosolver-SE:? software i s one of the direct equation solvers available for the iPSC@16160. It uses skyline storage of matrix elements, and is applicable to linear systems that do not require pivoting. The product is available as a library thzt includes additional' operations to support Finite Element Method applications. This paper discusses the software architecture and some of the high performance algorithms.","PeriodicalId":313314,"journal":{"name":"The Sixth Distributed Memory Computing Conference, 1991. Proceedings","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1991-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130494114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}