{"title":"Solution to an architectural problem in parallel computing","authors":"D. Lee","doi":"10.1109/FMPC.1990.89494","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89494","url":null,"abstract":"The author presents a solution to the previously unsolved problem of how to construct an array processor with N processing elements, N memory modules, and an interconnection network that allows parallel access and alignment of rows, columns, diagonals, contiguous blocks, and distributed blocks of N*N arrays. The solution leads to an array processor that is both simple and efficient in two critical respects: the memory system uses the minimum number of memory modules to achieve conflict-free memory access and is able to compute N addresses with O(log N) logic gates in O(1) time. The interconnection network is multistage with O(N log N) logic gates, and it can align any of these data vectors for store/fetch, as well as for subsequent processing with a single pass through the network.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115290559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Porting an iterative parallel region growing algorithm from the MPP to the MasPar MP-1","authors":"J. Tilton","doi":"10.1109/FMPC.1990.89456","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89456","url":null,"abstract":"An iterative parallel region growing (IPRG) algorithm, developed and implemented on the massively parallel processor (MPP) at NASA Goddard, is described. The experience of porting the IPRG algorithm from the MPP to the MasPar MP-1 is related. Porting was very easy and straightforward, especially when the Dorband virtualization software was used. The porting discussed, consisting of 1879 lines of MPL code, was accomplished in just two weeks by the author. The major difference between the two implementations is that the looping over virtual parallel arrays had to be done explicitly and had to be the outermost loop (for efficiency) in the MPP Pascal implementation, whereas the same looping was done implicitly in the MPL implementation and could be done in the innermost loop. In a performance test on a 256*256 pixel section of a seven-band Landsat thematic mapper image data set, the smaller MasPar MP-1 computer had roughly the same or better performance as the MPP. In the initial iterations, when the regions were still very small, the MPP was about 25% faster than the MasPar MP-1. By iteration 14, the MasPar MP-1 was 33% faster than the MPP, and for ensuing iterations indications are that the MasPar MP-1 speedup versus the MPP will be even larger.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129255601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A new computational model for massive parallelism","authors":"S. Berkovich","doi":"10.1109/FMPC.1990.89466","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89466","url":null,"abstract":"A universal parallel computational model using the concept of distributed associative processing (DASP) is presented. The effective realization of this model is due to unique facilities of a multiaccess content-induced transaction overlap communication technique. The operative mechanism of the model uses a plurality of processing nodes with content-addressable transmission buffers. This construction can be incrementally expanded through the hierarchical multiplexing of information pathways. The developed principles simplify hardware/software organization and ensure performance gains for a variety of computational schemes.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129176293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiple channel architecture","authors":"T. S. Wailes, D. Meyer","doi":"10.1109/FMPC.1990.89477","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89477","url":null,"abstract":"A parallel processing architecture based on multiple-channel optical communication is described. A large number of independent, selectable channels (or virtual buses) are available using a single optical fiber. Arbitrary interconnection patterns, as well as machine partitions, can be emulated by using appropriate channel assignments. Hierarchies of parallel architectures and simultaneous execution of parallel tasks are also possible. Recent technological advances in semiconductor laser technology that make such a parallel architecture possible, a basic system overview, various channel allocation strategies, and a summary of advantages compared with traditional interconnection techniques are presented.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"185 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114742820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Germain, Jean-Luc Béchennec, D. Etiemble, J. Sansonnet
{"title":"An interconnection network and a routing scheme for a massively parallel message-passing multicomputer","authors":"C. Germain, Jean-Luc Béchennec, D. Etiemble, J. Sansonnet","doi":"10.1109/FMPC.1990.89484","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89484","url":null,"abstract":"The communication system of a massively parallel architecture called MEGA is presented. The implications of massive parallelism for routing strategies and communication models are discussed. A routing strategy, called forced routing, is proposed. It minimizes contention by making full use of all the shortest paths in the network. Its performance has been studied by simulation, and the results are presented. The mixed communication model used allows processes with mutual reference to have direct information exchanges. The routing strategy can be implemented within a restricted chip area in CMOS technology.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124285693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance prediction-How good is good?","authors":"B. Stramm, Francine Berman","doi":"10.1109/FMPC.1990.89435","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89435","url":null,"abstract":"The prediction of performance of parallel algorithms mapped to parallel computers is addressed. Performance prediction models are parameterized by the algorithms, mapping, and target machine. A mechanism for comparing the accuracy of the models that establishes a partial order between them is proposed.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115061138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High performance mapping for massively parallel hierarchical structures","authors":"Sotirios G. Ziavras","doi":"10.1109/FMPC.1990.89467","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89467","url":null,"abstract":"Techniques for mapping image processing and computer vision algorithms onto a class of hierarchically structured systems are presented. In order to produce mappings of maximum efficiency, objective functions that measure the quality of given mappings with respect to particular optimization goals are proposed. The effectiveness and the computation complexity of mapping algorithms that yield very high performance by minimizing the objective functions are discussed. Performance results are also presented.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117219582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The StarLite project","authors":"E. llEEllEEliI, James G. Smith","doi":"10.1109/FMPC.1990.89501","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89501","url":null,"abstract":"The components and design of the StarLite programming environment for Modula-2 and several projects that are being developed using StarLite are discussed. StarLite emulates a 1000-node, shared virtual memory multiprocessor on a single hardware processor. The parallel programming package is discussed, and some of the operating system design problems for a massively parallel computer are reviewed.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125943771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An asynchronous multiprocessor design for branch-and-bound algorithms","authors":"Kam-Hoi Cheng, Q. Wang","doi":"10.1109/FMPC.1990.89440","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89440","url":null,"abstract":"A fast asynchronous multiprocessor system designed to implement branch-and-bound algorithms is described. Cooperating processors are only responsible for performing computation essential to the problem. Dynamic sharing of work and coordinate among processors are provided by several servers, all of which are capable of handling multiple accesses simultaneously. It is shown how to coordinate the use of these designs and prove the correctness of the authors' solution in reactivating idle processors and detecting the termination of the computation. The loss of computation power due to the uneven work-load distribution, coordination, and synchronization of processors has been reduced significantly compared with other hardware designs.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124779523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Comparative performance evaluation of a new SIMD machine","authors":"J. M. Jennings, Edward W. Davis, R. A. Heaton","doi":"10.1109/FMPC.1990.89468","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89468","url":null,"abstract":"The performance of BLITZEN, a new massively parallel machine, is compared with that of the Massively Parallel Processor (MPP) for two image-processing functions: rotation and resampling. These functions, as implemented on the MPP, were modified to exploit new architectural features of BLITZEN. The functional simulator of BLITZEN, used for algorithm development and timing information, is described. A performance comparison based on instruction cycle counts shows a significant speedup for the new machine due to architectural features that improve data movement capability.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127133636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}