{"title":"Representing the scaling behavior of parallel algorithm-machine combinations","authors":"D. Rover, Xian-He Sun","doi":"10.1109/FMPC.1992.234919","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234919","url":null,"abstract":"The scaling of algorithms and machines is essential to achieve the goals of high-performance computing. Thus, scalability has become an important aspect of parallel algorithm and machine design. It is a desirable property that has been used to describe the demand for proportionate changes in performance with adjustments in system size. It should provide guidance toward an optimal choice of an architecture, algorithm, machine size, and problem size combination. However, as a performance metric, it is not yet well defined or understood. The paper summarizes several scalability metrics, including one that highlights the behavior of algorithm-machine combinations as sizes are varied under an isospeed condition. A scaling relation is presented to facilitate general mathematical and visual techniques for characterizing and comparing the scalability information of these metrics.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122312135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Network design and performance for a massively parallel SIMD system","authors":"S. Darbha, E. Davis","doi":"10.1109/FMPC.1992.234889","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234889","url":null,"abstract":"It is shown that a nearest neighbor communication network can be complimented with a log-diameter multistage network to handle different communications patterns. This is especially useful when the pattern of data movement is not uniform. The designed network is evaluated for two cases: a dense case with many processing elements communicating and a sparse case. For 32-b data, the algorithm for computing partial sums of an array improves by 2.7 times with the multistage interconnection network. In a sparse random case, the number of cycles taken to communicate 32 b is 4000 (with 10% of the nodes communicating). Thus, it is concluded that a network like a multistage omega network is very useful for SIMD (single-instruction multiple-data) massively parallel machines. This is especially true if the machine is to be used for applications where long distance and nonuniform routing patterns are needed.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122876280","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C.-S. Chang, G. DeTitta, H. Hauptman, R. Miller, M. Poulin, P. Thuman, C. Weeks
{"title":"Solutions to the phase problem of X-ray crystallography on the Connection Machine CM-2","authors":"C.-S. Chang, G. DeTitta, H. Hauptman, R. Miller, M. Poulin, P. Thuman, C. Weeks","doi":"10.1109/FMPC.1992.234868","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234868","url":null,"abstract":"The authors have developed a formulation of the phase problem of X-ray crystallography in terms of a minimal function of phases and a new minimization algorithm called shake-and-bake for solving this minimal function. The implementation details of the shake-and-bake strategy on the Connection Machine CM-2 are presented. The shake-and-bake algorithm has been used to determine the atomic structure of four test structures, ranging from 28 to 317 atoms. These results indicate that shake-and-bake is effective on structures of this size.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129870625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of hot spot on the performance of multistage interconnection networks","authors":"Mohammed Atiquzzaman, M. S. Akhtar","doi":"10.1109/FMPC.1992.234871","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234871","url":null,"abstract":"Hot spots in multistage interconnection networks (MSINs) results in performance degradation of the network. The authors develop an analytical model for the performance evaluation of unbuffered MSINs under a single hot spot, followed by a performance comparison with buffered MSINs. For uniform traffic, a buffered network performs better than an unbuffered network. For a nonuniform traffic pattern causing congestion (for example, tree saturation) in the network, an unbuffered network outperforms a buffered network. This leads the authors to suggest a hybrid network which will be capable of switching from the buffered mode to the unbuffered mode in the presence of network congestion.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"168 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124684721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining switches for the NYU Ultracomputer","authors":"S. Dickey, R. Kenner","doi":"10.1109/FMPC.1992.234864","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234864","url":null,"abstract":"A pairwise combining switch has been implemented for use in the 16*16 processor/memory interconnection network of the NYU Ultracomputer prototype. The switch design may be extended for use in very large systems by providing greater combining capability. Methods for doing so are discussed.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123897494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance studies of packet switched augmented shuffle exchange networks","authors":"V. Ramachandran, R. Raines, J.S. Park, N. Davis","doi":"10.1109/FMPC.1992.234920","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234920","url":null,"abstract":"Extends previous research efforts related to the performance modeling of the fault-tolerant Augmented Shuffle Exchange Network (ASEN). The authors examine the ASEN run-time performance characteristics in a packet switched environment. The network performance is examined under a fault-free but congested network operating environment. Network performance parameters of time-in-system, queue lengths and delays, as well as the effects of non-uniform loading of the network are presented. The cost associated with implementation of an ASEN is compared with previously published metrics for the multistage cube network operating under the same environments. The authors conclude that, for the network and operating assumptions defined, the ASEN provides better performance at lower implementation costs than the multistage cube interconnection network.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"165 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120929768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving massively data parallel system performance with heterogeneity","authors":"S. Noh, K. Dussa-Zieger","doi":"10.1109/FMPC.1992.234901","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234901","url":null,"abstract":"The authors introduce a new type of combined SIMD/MIMD (single-instruction multiple-data/multiple-instruction multiple-data) architecture called a hybrid system. The hybrid system consists of two components. The first component is massively parallel and consists of a large number of slow processors that are organized in an SIMD architecture. The second component consists of only a few fast processors (possibly only one) which are organized in an MIMD architecture. The authors contend that a hybrid system provides a means to adequately adjust to the characteristics of a parallel program, i.e., changing parallelism. They describe the machine and application model, and discuss the performance impact of such a system. Viewing the CM-2 with its front-end as a special case of a hybrid system, they substantiate the arguments and report measurements for a Gaussian elimination algorithm.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"183 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121196196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The speedup and efficiency of 3-D FFT on distributed memory MIMD systems","authors":"D. Marinescu","doi":"10.1109/FMPC.1992.234894","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234894","url":null,"abstract":"The author analyzes a 3-D FFT (fast Fourier transform) algorithm for a distributed memory MIMD (multiple-instruction multiple-data) system. It is shown that the communication complexity limits the efficiency even under ideal conditions. The efficiency for the optimal speedup is eta /sub opt/=0.5. Actual applications which experience load imbalance, duplication of work, and blocking are even less efficient. Therefore the speedup with P processing elements, S(P)= eta *P, is disappointingly low. Moreover, the 3-D FFT algorithm is not susceptible to massive parallelization, and the optimal number of PEs is rather low even for large problem size and fast communication. A strategy to reduce the communication complexity is presented.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121652632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Boolean function manipulation on massively parallel computers","authors":"G. Cabodi, S. Gai, M. Reorda","doi":"10.1109/FMPC.1992.234869","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234869","url":null,"abstract":"A new algorithm for implementing the basic operations on BDDs (binary decision diagrams) on a massively parallel computer is presented. Each node is associated with a processor, and nodes belonging to the same level are evaluated together. An implementation of the algorithm on a Connection Machine CM2 has been done, and the prototype is being tested on a set of benchmark applications. Experimental results, showing the time required to perform the apply operation on BDDs of growing size demonstrate the exactness of the complexity analysis and the effectiveness of the approach.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127775475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The MetaMP approach to parallel programming","authors":"S. Otto, M. Wolfe","doi":"10.1109/FMPC.1992.234921","DOIUrl":"https://doi.org/10.1109/FMPC.1992.234921","url":null,"abstract":"The authors are researching techniques for the programming of large-scale parallel machines for scientific computation. They use an intermediate-level language, MetaMP, that sits between High Performance Fortran (HPF) and low-level message passing. They are developing an efficient set of primitives in the intermediate language and are investigating compilation methods that can semi-automatically reason about parallel programs. The focus is on distributed memory hardware. The work has many similarities with HPF efforts although their approach is aimed at shorter-term solutions. They plan to keep the programmer centrally involved in the development and optimization of the parallel program.<<ETX>>","PeriodicalId":117789,"journal":{"name":"[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1992-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126346196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}