{"title":"Exploiting SIMD computers for general purpose computation","authors":"P. Wilsey, D. Hensgen","doi":"10.1109/IPPS.1992.222985","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222985","url":null,"abstract":"This paper proposes a strategy for exploiting massively parallel SIMD computers for general purpose computation. The approach places compiled programs into the local memory space of each distinct processing element (PE). Within each PE, a local program counter is initialized and the instructions are interpreted in parallel across all of the PEs by control signals emanating from the central control unit. Initial experiments with randomly generated programs show that speedup of approximately 700 is attainable on a SIMD processor with 8 K processing elements. Furthermore, additional experiments have shown that the speedup increases linearly with the number of processing elements.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130206845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Histogramming on a reconfigurable mesh computer","authors":"J. Jenq, S. Sahni","doi":"10.1109/IPPS.1992.223008","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223008","url":null,"abstract":"The authors develop reconfigurable mesh (RMESH) algorithms for window broadcasting, data shifts and consecutive sum. These are then used to develop efficient algorithms to compute the histogram of an image and to perform histogram modification. The histogram of an N*N image is computed by an N*N RMESH in O( square root B log /sub square root B/(N/ square root B) for B<N, O( square root N) for B=N, and O( square root B) for N<B<or=N/sup 2/. B is the number of gray scale values. Histogram modification is done in O( square root N) time by an N*N RMESH.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"44 23","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132737925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Matching algorithms and architecture in hierarchical shared-memory multiprocessor (HSM) systems","authors":"A. Khokhar, M. Dubois","doi":"10.1109/IPPS.1992.222968","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222968","url":null,"abstract":"The authors map several interprocessor communication and linear algebra algorithms on a memory coherent hierarchical shared-memory multiprocessor (HSM) system and their communication complexities are evaluated. The results show that the hierarchical architecture is ill-suited to algorithms exhibiting no temporal locality on data accesses or to the algorithms with point-to-point communication.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132695974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VLSI architectures for recursive and multiple-window order statistic filtering","authors":"M. Hakami, P. Warter, C. Boncelet, David Nassimi","doi":"10.1109/IPPS.1992.223030","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223030","url":null,"abstract":"Based on a recently developed class of sorting networks, new VLSI architectures suitable for order statistic filtering are developed. The major advantage of these architectures is minimal response-time regardless of the number of stages in the pipeline; an effective characteristic for implementing recursive order statistic filters. The devised word-parallel architecture is the only one introduced to date that is capable of operating in both recursive and standard modes with optimal throughput. The proposed architectures are also suitable for implementing order statistic filters with multiple overlapping windows.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"322 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116340099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Saoudi, M. Nivat, C. Rangan, Ravi Sundaram, G. D. Ramkumar
{"title":"A parallel algorithm for recognizing the shuffle of two strings","authors":"A. Saoudi, M. Nivat, C. Rangan, Ravi Sundaram, G. D. Ramkumar","doi":"10.1109/IPPS.1992.223062","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223062","url":null,"abstract":"Presents a parallel algorithm for verifying that a string X is formed by the shuffle of two strings Y and Z. The algorithm runs in O(log/sup 2/n) time with O(n/sup 2//log/sup 2/n) processors on the EREW-PRAM model.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"15 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133876725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling and control of distributed asynchronous computations","authors":"L. Lin, J. Antonio","doi":"10.1109/IPPS.1992.222995","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222995","url":null,"abstract":"A stochastic model for a class of distributed asynchronous fixed point algorithms is presented and a methodology for optimizing the rate of convergence is introduced. An important parameter in the authors model, called the degree of synchronization, quantifies the average amount of time each processor is willing to wait for information from other processors (before beginning computation of its update variable based on the available estimates of variables from other processors). The authors analyze the relationship between the convergence rate and the degree of synchronization for a class of iterative fixed point algorithms. Preliminary analysis indicates that significant improvements in convergence rates can be achieved by proper control of the parameters in the authors model.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121727516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A superior class of networks for reconfigurable meshes","authors":"R. Mazzaferri, Heiko Schröder","doi":"10.1109/IPPS.1992.223006","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223006","url":null,"abstract":"To achieve fault tolerance through reconfiguration in mesh-connected arrays many rail networks with vastly different effectiveness and cost have been presented. The authors attempt a unified notation of these networks to allow for their comparative evaluation. They further present a method to improve the effectiveness of fault tolerant networks by combining several small switches into large crossbar switches. This method is applicable to almost all rail networks presented in the literature leading to a significant improvement of effectiveness and often also delay time along the network connections for little hardware cost. Furthermore, the switches used provide fault tolerance of the network itself, which is usually unrealistically assumed to be always fault free.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122380526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized compressed tree machines","authors":"Ajay K. Gupta, Hong Wang","doi":"10.1109/IPPS.1992.223065","DOIUrl":"https://doi.org/10.1109/IPPS.1992.223065","url":null,"abstract":"Parallel machines interconnecting up to thousands of processors have been proposed and recently built. One of the earliest and the most prominent one is a complete binary tree machine. The authors propose a family of tree machines called generalized compressed tree machines. Generalized compressed tree machines may, in general, be viewed as a derivative of the complete binary tree networks.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124658189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The vesicular dataflow model","authors":"R. Podraza, Dariusz Turlej, K. Piorun","doi":"10.1109/IPPS.1992.222977","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222977","url":null,"abstract":"The Vesicular Dataflow (VDF) model is presented in the paper. The VDF model has been formulated to introduce a way of storing and retrieving information and hence to reduce the main drawback of the basic DF model. Tokens can be stored in vesicles in the VDF model and then distributed in non-deterministic way. State-dependent computations and global variables can be expressed in the dataflow manner. Informal definition of the VDF model and some simple applications are covered by the paper.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124755620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Gravano, G. Pifarré, Gustavo Denicolay, J. Sanz
{"title":"Adaptive deadlock-free worm-hole routing in hypercubes","authors":"L. Gravano, G. Pifarré, Gustavo Denicolay, J. Sanz","doi":"10.1109/IPPS.1992.222975","DOIUrl":"https://doi.org/10.1109/IPPS.1992.222975","url":null,"abstract":"Two new algorithms for worm-hole routing in the hypercube are presented. The first hypercube algorithm is adaptive, but non-minimal in the sense that some derouting is permitted. Then another deadlock-free adaptive worm-hole based routing algorithm for the hypercube interconnection is presented which is minimal. Finally some well-known worm-hole algorithms for the hypercube were evaluated together with the new ones on a hypercube of 2/sup 10/ nodes. One oblivious algorithm, the Dimension-Order, or E-Cube routing algorithm (W. Dally, C. Seitz, 1987) was tried. In addition, three partially adaptive algorithms were considered: the Hanging algorithm (Y. Birk, P. Gibbons, D. Soroker, J. Sanz, 1989 and S. Konstantinidou, 1990), the Zenith algorithm (S. Konstantinidou, 1990), and the Hanging-Order algorithm (G.-M. Chia, S. Chalasani, C.S. Raghavendra, 1991). Finally, a fully adaptive minimal algorithm presented independently by L. Gravano, G. Pifarre, S.A. Felperin and J. Sanz (1991) and J. Duato was tried. This algorithm allows each message to choose adaptively among all the shortest paths from its source to its destination. Only four virtual channels per physical link are needed to achieve this. This technique is referred to as Fully. The results obtained show that the two new algorithms are good candidates as a choice for worm-hole routing in the hypercube network.<<ETX>>","PeriodicalId":340070,"journal":{"name":"Proceedings Sixth International Parallel Processing Symposium","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125012678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}