{"title":"A fast O(k) multicast message routing algorithm","authors":"T. J. Sager, B. McMillin","doi":"10.1109/FMPC.1990.89485","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89485","url":null,"abstract":"An O(k) algorithm for multicast routing that appears to be faster and to create in the mean less traffic than previously presented O(k) multicast routing algorithms is given. The algorithm, called bestfit, is simple enough to be easily implemented in hardware, and, unlike the case for other O(k) multicast algorithms, destinations can be processed as soon as they are received, since the processing of a destination depends only on statistical properties of the destinations already processed. The algorithm basically tries to place each destination on the channel to which it 'fits best'.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127420240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the scalability of FFT on parallel computers","authors":"Abhishek Gupta, V. Kumar","doi":"10.1109/FMPC.1990.89441","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89441","url":null,"abstract":"The scalability of the parallel fast Fourier transform (FFT) algorithm on mesh- and hypercube-connected multicomputers is analyzed. The hypercube architecture provides linearly increasing performance for the FFT algorithm with an increasing number of processors and a moderately increasing problem size. However, there is a limit on the efficiency, which is determined by the communication bandwidth of the hypercube channels. Efficiencies higher than this limit can be obtained only if the problem size is increased very rapidly. Technology-dependent features, such as the communication bandwidth, determine the upper bound on the overall performance that can be obtained from a P-processor system. The upper bound can be moved up by either improving the communication-related parameters linearly or increasing the problem size exponentially. The scalability analysis shows that the FFT algorithm cannot make efficient use of large-scale mesh architectures. The addition of such features as cut-through routing and multicasting does not improve the overall scalability on this architecture.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116831146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel algorithms for 2D Kalman filtering","authors":"D. J. Potter, M. P. Cline","doi":"10.1109/FMPC.1990.89436","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89436","url":null,"abstract":"Methods for implementing 2-D reduced-update Kalman filtering using parallel machines are described. Various types of parallel architectures can be used for the implementation. Algorithms are described and compared for implementation on the Sequent Balance 21, a multiprocessor system with shared memory, CLIP4, a 96*96 SIMD processor array, and the Connection Machine, an SIMD array of 64 K processors. All the machines show great improvement in performance. The advantage with the Connection Machine is the individual processor's ability to write and read from different locations. This allows multiple-image filtering and a great increase in the efficiency of processor usage.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131188804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploration of reconfigurable architectures: an empirical approach","authors":"W. Ligon, U. Ramachandran","doi":"10.1109/FMPC.1990.89461","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89461","url":null,"abstract":"An approach to designing computer architectures in which architectural features are analyzed for utility and cost with respect to the system software that uses them is applied to reconfigurable architectures. A notation for classifying reconfigurable architectures that covers three major areas of reconfigurable architecture design-processor reconfiguration, control reconfiguration, and connection reconfiguration-is presented. A simulator based on this notation, called the reconfigurable architecture workbench, is described. This has been used to study the effects of processor reconfiguration within the image-processing application domain. Experimental results showing that some types of processor reconfiguration provide significant speedup over parallel architectures without processor reconfiguration are presented. The engineering problems present in implementing processor reconfiguration are explored, and some solutions to these problems are suggested.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"122 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128416373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Particles: a naturally parallel approach to modeling","authors":"D. House, D. Breen","doi":"10.1109/FMPC.1990.89451","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89451","url":null,"abstract":"Particle-based technique that are being developed for the mechanical and visual modeling of complex nonrigid materials are described. These are motivated by both the goal of developing a real-time training simulator for arthroscopic surgery and the goal of developing an accurate model of cloth for automatic garment handling. In this approach a material is represented as a large collection of microscopic particles interacting with each other according to simple physical laws operating on a microscopic level. Since modeled materials derive their macroscopic properties from these microscopic interactions, the technique closely parallels the actual discrete structures of nature. The development of software tools for use in experimentation, which has been the focus of initial investigations, is discussed.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131993790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-tolerance and learning performance of the back-propagation algorithm using massively parallel implementation","authors":"P. Murali, H. Wechsler, M. Manohar","doi":"10.1109/FMPC.1990.89483","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89483","url":null,"abstract":"Mapping the backpropagation (BP) algorithm onto an SIMD (single-instruction-stream, multiple-data-stream) machine, such as the Massively Parallel Processor, is considered. It is shown that the size of the connectionist network underlying BP can be scaled up to large sizes, resulting in improved performance. Specifically, both fault tolerance and learning speed can be enhanced.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124108360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rearrangeability of shuffle-exchange networks","authors":"H. Çam, J. Fortes","doi":"10.1109/FMPC.1990.89476","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89476","url":null,"abstract":"A proof for the rearrangeability of (2n-1)-stage shuffle-exchange networks with N=2/sup n/ inputs is given. The proof makes use of the notion of balanced matrices for representing passable permutations through a shuffle-exchange network. Because the proof is not constructive, it does not lead to a routing algorithm directly. Therefore, a heuristic algorithm is provided for routing arbitrary permutations on the (2n-1)-stage shuffle-exchange network. A new proof for the rearrangeability of the (2n-1) stage reduced Omega /sub N/ Omega /sub N//sup -1/ network is also given, and a routing algorithm using precomputed digit-controlled routing tags is presented.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124212217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Too many cooks don't spoil the broth: light simulation on massively parallel computers","authors":"Peter Kochevar","doi":"10.1109/FMPC.1990.89445","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89445","url":null,"abstract":"A computer graphics algorithm for simulating the propagation of light and its interaction with matter on a massively parallel computer is presented. This algorithm, called the tagged shooting method, is designed for a virtual machine containing a great number of simple communicating processors arrayed into a cubical three-dimensional lattice. Only nearest neighbor communication among processors is assumed, and there is no reliance on global shared memory. The algorithm is similar in spirit to the classical progressive refinement radiosity method designed for more conventional computers but is not an adaptation of that technique to massive parallelism. Instead, the new algorithm uses a discretization of the wave equation as a local rule for shuttling radiant energy values between processors that correspond to regions of space. A number of example images that were created with an implementation of the algorithm on a Connection Machine are depicted and critiqued.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116807471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Toshiyuki Shibuya, Kaoru Kawamura, Tatsuya Shindo, Hideki Miwatari, Y. Ohki
{"title":"Application specific massively parallel machine","authors":"Toshiyuki Shibuya, Kaoru Kawamura, Tatsuya Shindo, Hideki Miwatari, Y. Ohki","doi":"10.1109/FMPC.1990.89472","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89472","url":null,"abstract":"The massively parallel layout engine, MAPLE, an SIMD computer that provides very high computing power for CAD applications, is discussed. MAPLE uses up to 64K processors, each having a 32-kB memory, a 512-b data register, and an ALU. The system clock rate is 20 MHz. 64K processors in parallel produce an aggregate of 40-billion 32-b integer additions per second. Parallel routing, the architecture, and the software are covered.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123160494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Annexstein, M. Baumslag, M. Herbordt, B. Obrenic, A.L. Rosenberg, C. Weems
{"title":"Achieving multigauge behavior in bit-serial SIMD architectures via emulation","authors":"F. Annexstein, M. Baumslag, M. Herbordt, B. Obrenic, A.L. Rosenberg, C. Weems","doi":"10.1109/FMPC.1990.89459","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89459","url":null,"abstract":"It is shown that the expected benefits of multigauging can be attained without any hardware modification and that additional advantages may be gained from enabling emulations. The authors start with a (physical) bit-serial architecture and build (software) support for multigauge computation on top of its native instruction set. Assumptions about this instruction set are modest and confined solely to its functionality, not its implementation. Multigauge behavior is achieved as a high-level abstraction, which is independent of the physical design. The danger that (hardware-enabled) multigauge behavior might preclude certain types of hardware optimization is avoided.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124720418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}