W. Nation, S. Fineberg, M. Allemang, T. Schwederski, T. Casavant, H. Siegel
{"title":"Efficient masking techniques for large-scale SIMD architectures","authors":"W. Nation, S. Fineberg, M. Allemang, T. Schwederski, T. Casavant, H. Siegel","doi":"10.1109/FMPC.1990.89469","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89469","url":null,"abstract":"SIMD (single-instruction-stream, multiple-data-stream) architectures require mechanisms that efficiently enable and disable mask processors to support flexible programming. Most current SIMD architectures use local masking. Global processor masks, specified by the control unit, are more efficient for tasks where the masking is data independent. An efficient hybrid masking technique that supports global masking, as well as local masking, for SIMD architectures constructed from standard microprocessors is proposed. A design for the hybrid mechanism is described, and its experimental performance using the existing PASM prototype is examined. It is shown that the hybrid masking technique can increase the utilization of PEs and thus increase performance, the degree of improvement being algorithm dependent.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"143 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125774968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Digital Transform Machine","authors":"W. W. Kirkman","doi":"10.1109/FMPC.1990.89470","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89470","url":null,"abstract":"The Digital Transform Machine, a massively parallel computer architecture based on a configurable hardware model of processing, is discussed. Some of the implications of this model of computing are examined, and the cellular structure and interconnection network of a proof-of-concept computer based on it are described. Areas that merit particular attention in future research are identified.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133810381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On bit-serial packet routing for the mesh and the torus","authors":"F. Makedon, A. Simvonis","doi":"10.1109/FMPC.1990.89475","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89475","url":null,"abstract":"The bit-serial routing problem wherein each packet consists of a sequence of k flits and is thus called a snake, is considered. On the basis of the properties of the snake during the routing, a formal definition is given for three different packet routing models, namely, the store-and-forward model, the cut-through model, and the wormhole model. The wormhole model, which is most commonly used in practice, is studied. The first algorithms (deterministic and probabilistic) based on the wormhole model for the permutation routing problem on a chain, on a square mesh, and on a square torus are given. A new lower bound is derived for distance-limited permutation routing on a ring of processors, and an algorithm that matches this lower bound if the packets are routed independently is given.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131747591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Vcode: a data-parallel intermediate language","authors":"G. Blelloch, Siddhartha Chatterjee","doi":"10.1109/FMPC.1990.89498","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89498","url":null,"abstract":"A description is given of Vcode, a data-parallel intermediate language. Vcode is designed to allow easy porting of data-parallel languages to a wide class of parallel machines, and for experimenting with compiling such languages. The design goal was to define a simple language whose primitives can be implemented efficiently but that is still powerful enough to express the features of existing data-parallel languages. Vcode contains about 50 instructions, most of which manipulate arbitrarily long vectors of atomic values, and includes a set of segmented instructions that are crucial for implementing data-parallel languages that permit nested parallelism. The design decisions are discussed, and it is shown how three data-parallel languages-C*, Fortran 8*, and Paralation Lisp-can be mapped onto Vcode. The issues encountered in implementing Vcode on different kinds of parallel machines, as well as specific techniques for implementing it on the Connection Machine, are examined.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115470478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H.J. Siegel, K. Batcher, C. Brownstein, W. J. Camp, M. Halem, J. Harris, R. Miller, D. Parkinson, A. P. Reeves, J. Reif, A. Rosenfeld, D. Schaefer, I.D. Scherson, P. Schneck, G. Steele, L. Uhr, U. Vishkin
{"title":"What are the two most important issues facing the design and use of massively parallel computers?","authors":"H.J. Siegel, K. Batcher, C. Brownstein, W. J. Camp, M. Halem, J. Harris, R. Miller, D. Parkinson, A. P. Reeves, J. Reif, A. Rosenfeld, D. Schaefer, I.D. Scherson, P. Schneck, G. Steele, L. Uhr, U. Vishkin","doi":"10.1109/FMPC.1990.89507","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89507","url":null,"abstract":"A variety of views is presented by the participants in this panel discussion. Concerns are expressed regarding communication, control, software, programming, cost, performance measures, among others. The responses reflect the varied backgrounds and perspectives of the panelists.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122347832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A bit-parallel, word-parallel, massively parallel associative processor for scientific computing","authors":"B. Alleyne, D. Kramer, I. Scherson","doi":"10.1109/FMPC.1990.89457","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89457","url":null,"abstract":"A simple but powerful parallel architecture based on the classical associative processor model, which allows bit-parallel computation and communication, is proposed. Complex operations such as multiplication execute in O(m) cycles, as opposed to O(m/sup 2/) for bit-serial machines. This permits very fast processing of floating-point data. A bit-parallel communication network that exploits associative data location independence is presented. It provides the system with a reconfiguration capability, which improves chip yield, as well as fault tolerance. The simplicity of the architecture lends itself to VLSI implementation and hence allows the construction of a bit-parallel, word-parallel, and massively parallel (P/sup 3/) computing system.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"84 1-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123525921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Simulating numerically controlled machining in parallel","authors":"P. Su, S. Drysdale","doi":"10.1109/FMPC.1990.89443","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89443","url":null,"abstract":"Several parallel algorithms for simulating numerically controlled machining are presented. Various implementations of these algorithms on the Connection Machine are discussed. These experiments provide information about the various performance tradeoffs involved in writing programs for the Connection Machine. They also show that this particular problem is well suited to parallel solutions, since the algorithms run much faster than previous sequential algorithms.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126047840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Functional and topological relations among banyan multistage networks of differing switch sizes","authors":"A. Youssef, B. Arden","doi":"10.1109/FMPC.1990.89474","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89474","url":null,"abstract":"If two N*N networks W and W' have switch sizes r and s, respectively, and if r>s, then W realizes a larger number of permutations than W'. Consequently, the two networks can never be equivalent. However, W may realize all the permutations of W', in which case W is said to functionally cover W' in the strict sense. More generally, W is said to functionally cover W' in the wide sense if the terminals of W can be relabeled so that W realizes all the permutations of W'. Functional covering is topologically characterized, and an optimal algorithm to decide strict functional covering is developed. It is shown that any N-*N-digit permutation network of switch size r functionally covers in the wide sense any other N-*N-digit permutation network of switch size s if and only if r is a perfect power of s, where a digit permutation network is a banyan multistage network such that the interconnections are permutations that permute digits in a specified manner.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"2018 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125843878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mapping reusable software components onto the ARC parallel processor","authors":"L. Welch, B. Weide","doi":"10.1109/FMPC.1990.89502","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89502","url":null,"abstract":"It is shown how to map the components of a program onto the ARC (Architecture for Reusable Components) processor automatically in a way that exploits its features. Mapping consists of two phases. The first phase determines the maximum amount of parallelism attainable from a program in the model of parallel execution. This is done by mapping program components onto logical processors (of which there are an infinite number). The second phase maps the contents of the logical processors onto physical processors (of which there are a limited number). It is shown to (1) identify the distributable components, of the system, (2) determine the relevant relationships among the components, (3) model the maximum amount of parallelism attainable with the model of parallel execution used, and (4) use the information from steps 1-3 to map components onto the processor nodes of ARC. Previous related work is reviewed.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121708021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Zapata, I. Benavides, F. F. Rivera, J. Bruguera, J. Carazo
{"title":"Image reconstruction on hypercube computers","authors":"E. Zapata, I. Benavides, F. F. Rivera, J. Bruguera, J. Carazo","doi":"10.1109/FMPC.1990.89448","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89448","url":null,"abstract":"The problem of the 3-D reconstruction of an object from its 2-D projection images using filtered backprojection is addressed. The implementation of the filtered backprojection method on hypercube computers is analyzed. It is shown that the parallel algorithm is general in the sense that it does not impose any restriction on the problem space dimensions and is adaptable to any hypercube dimension. The flexibility of the algorithm is rooted in the methodology developed for embedding algorithms into hypercubes. The algorithmic complexity is analyzed. Because the data redundancy associated with the replication of the projection images in all the PEs has allowed the process of simple backprojection to be designed without routing, an optimum algorithmic complexity is obtained.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134131316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}