{"title":"Quantitative studies of processing element granularity","authors":"T. C. Marek, E. Davis","doi":"10.1109/FMPC.1992.234925","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234925","url":null,"abstract":"Quantitative results of experiments on PE (processing element) granularities are presented. An architecture simulation workbench has been developed for experiments on PE granularities of 1, 4, 8, and 16-b. An analysis of the impact of various I/O (input/output) and communication path widths is also possible. Overall performance, communication balance, PE utilization, and operand lengths can be monitored to evaluate the merits of various granularities and feature sets. This workbench has been used to run a set of benchmark algorithms that cover a range of computation and communication requirements, a range of data sizes, and a range of problem array sizes. The authors report results for two of the algorithms studied by T.C. Marek (1992): image rotation and image resampling. The results obtained are counterintuitive. They indicate that bit-serial machines have performance advantages due to inherent bit-oriented activity, even when using multiple bit operands, and to inter-PE communication when paths are narrower than the processor granularity.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"78 12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129763369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient algorithms for locating a core of a tree network with a specified length","authors":"S. Peng, W. Lo","doi":"10.1109/FMPC.1992.234904","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234904","url":null,"abstract":"The authors present efficient algorithms for finding a core of tree with a specified length for both sequential and parallel computational models. The algorithms can be readily extended to a tree network in which arcs have nonnegative integer lengths. The authors also present a parallel version of the algorithm on an EREW PRAM (parallel random access machine) model. The results presented might provide a basis for the study of other facility shapes such as trees and forests of fixed sizes.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133399409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optical interconnects for multiprocessors cost performance trade-offs","authors":"P. Lalwaney, L. Zenou, A. Ganz, I. Koren","doi":"10.1109/FMPC.1992.234948","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234948","url":null,"abstract":"The authors demonstrate the performance advantages of wavelength division multiplexing (WDM) based optical interconnects in the face of partial structures dictated by the hardware restrictions of the currently available technology. Because the cost of optical communication hardware for WDM-star-based interconnects may be high, reduced cost structures have been introduced. The performance of the optical implementations of the reduced cost structures is compared to that of the electronic implementations for the hypercube topology. The performance is compared in terms of the communication overhead in implementing two commonly used algorithms on these structures. Results indicate that, in most situations, the optically implemented reduced cost variations perform better than the electronic implementations. Moreover, the hardware cost-performance tradeoffs show that among the optically implemented schemes, the performance degradation of the reduced cost variations is not significant in view of the hardware savings involved.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114287904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Siegel, L. Valiant, P. Woodward, M. J. Flynn, L.M. Ni
{"title":"Perspectives on massively parallel computation","authors":"H. Siegel, L. Valiant, P. Woodward, M. J. Flynn, L.M. Ni","doi":"10.1109/FMPC.1992.234875","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234875","url":null,"abstract":"The areas of algorithms applications, architectures, and system software are discussed with reference to massively parallel computation.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"233 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114992491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Establishing an MPP guidepost","authors":"S. Nelson","doi":"10.1109/FMPC.1992.234878","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234878","url":null,"abstract":"It is noted that no system has appeared which has been recognized by the high-performance computing community as the guidepost for massively parallel processing (MPP). The author describes what is necessary for a system to become such a guidepost. A comparison is made to the CRAY-1, a system which serves as a guidepost for vector computing and serves as a standard by which all types of high-performance computing systems are measured. It is claimed that establishing an MPP guidepost requires building a system that delivers on the promised potential of scalable parallel computing, by providing high sustained performance on a wide variety of production application, with very little programming effort.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114419029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Staggered distribution: a loop allocation scheme for dataflow multiprocessor systems","authors":"J. T. Lim, A. Hurson, B. Lee, B. Shirazi","doi":"10.1109/FMPC.1992.234944","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234944","url":null,"abstract":"The authors present a staggered distribution scheme for DOACROSS loops. The scheme uses heuristics to distribute the loop iterations unevenly among processors in order to mask the delay caused by data dependencies and inter-PE (processing element) communication. Simulation results have shown that this scheme is effective for loops that have a large degree of parallelism among iterations. The scheme, due to its nature, distributes loop iterations among PEs based on architectural characteristics of the underlying organization, i.e. processor speed and communication cost. The maximum speedup attained is very close to the maximum speedup possible for a particular loop even in the presence of inter-PE communication cost. This scheme utilizes processors more efficiently, since, relative to the equal distribution approach, it requires fewer processors to attain maximum speedup. Although this scheme produces an unbalanced distribution among processors, this can be remedied by considering other loops when making the distribution to produce a balanced load among processors.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124503261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embedding multilevel structures into massively parallel hypercubes-connection machine results for computer vision algorithms","authors":"Sotirios G. Ziavras","doi":"10.1109/FMPC.1992.234913","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234913","url":null,"abstract":"Investigates the problem of embedding multilevel structures into hypercubes. The widely used pyramid belongs to the class of multilevel structures. Although several algorithms have been proposed for embedding pyramids into hypercubes, there do not exist algorithms for embedding general multilevel structures. For the special case of the pyramid, this research carries out a comparative analysis that involves four embedding algorithms. Results for a Connection Machine system CM-2 containing 16384 processors are presented, including the general case.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"360 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122770144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal implementation of parallel divide-and-conquer algorithms on de Bruijn networks","authors":"Xiaoxiong Zhong, V. Lo, S. Rajopadhye","doi":"10.1109/FMPC.1992.234914","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234914","url":null,"abstract":"Studies the problem of optimal implementation of parallel divide-and-conquer algorithms on binary de Bruijn networks. A divide-and-conquer algorithm is modeled as a temporal complete binary tree computation structure. An important contraction property between two successive binary de Bruijn networks is revealed. A twice-size complete binary tree is mapped to a de Bruijn network. Two nodes in the complete binary tree are mapped to a single node. The mapping is of dilation one, communication contention free and of good load balance.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123852068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MPPs, Amdahl's law, and comparing computers","authors":"M. Annaratone","doi":"10.1109/FMPC.1992.234879","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234879","url":null,"abstract":"The author examines Amdahl's law in the context of parallel processing and provides some arguments as to what the applicability of this law really is. Amdahl's law establishes an upper bound on the available parallelism given the fraction of sequential code present in an application. In this paper, Amdahl's law is revisited to derive a formulation which allows one to carry out some quantitative analysis. The claim that MPPs (massively parallel processors) are special-purpose systems is also addressed.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"163 11-12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114037958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Choudhary, G. Fox, S. Ranka, S. Hiranandani, K. Kennedy, C. Koelbel, C. Tseng
{"title":"Compiling Fortran 77D and 90D for MIMD distributed-memory machines","authors":"A. Choudhary, G. Fox, S. Ranka, S. Hiranandani, K. Kennedy, C. Koelbel, C. Tseng","doi":"10.1109/FMPC.1992.234911","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234911","url":null,"abstract":"The authors present an integrated approach to compiling Fortran 77D and Fortran 90D programs for efficient execution on MIMD (multiple-instruction multiple-data) distributed-memory machines. the integrated Fortran D compiler relies on two key observations. First, array constructs may be scalarized into FORALL loops without loss of information. Second, loop fusion, partitioning, and sectioning optimizations are essential for both Fortran D dialects. A portable run-time library can also reduce the complexity and machine-dependence of the compiler. All optimizations except coarse-grain pipelining and data prefetching have been implemented in the current Fortran D compiler prototype.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"125 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122375583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}