{"title":"Concurrent processing with result sharing: model, architecture, and performance analysis","authors":"S. Krishnaprasad, B. Shirazi","doi":"10.1109/FMPC.1990.89499","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89499","url":null,"abstract":"An efficient computing model, called concurrent processing with result sharing, is introduced. An architecture suitable for executing programs under this model is developed. A performance analysis of this architecture based on a queuing network model is presented to investigate the effect of problem dynamics on the speed of problem solving and the resource requirements. The analysis indicates that, for both coarse- and fine-grain computations, as the amount of recomputation increases, the number of function units needed decreases and the delay at the processor element decreases significantly. For fine-grain computation, the bottlenecks at either the matching unit or the instruction store significantly degrade the system performance. This can only be avoided by using a more expensive multiple-ring architecture.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129174940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A silicon compiler for massively parallel image processing ASICs","authors":"A. Boubekeur, G. Saucier","doi":"10.1109/FMPC.1990.89427","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89427","url":null,"abstract":"A silicon compiler design methodology for massively parallel architecture for image processing is introduced. It starts from an algorithmic description of the application in a language comparable to the GAPP NCR language (GAL) and generates an optimized circuit organized as a 2-D array of 1-b processing elements with minimized resources. The effectiveness of the approach is shown by two examples. The first is an ASIC (application-specific integrated circuit) for two basic mathematical morphology operations, dilation and erosion. The second is an ASIC for convolution. Both have been implemented in a double-aluminium 2- mu m CMOS standard cell. In both cases the processor element has been found to be very effective. Considerable area savings have been achieved.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124224243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large integer multiplication on massively parallel processors","authors":"B. Fagin","doi":"10.1109/FMPC.1990.89434","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89434","url":null,"abstract":"Results obtained by multiplying large integers using the Fermat number transform are presented. The effectiveness of the approach was previously limited by word-length constraints, which are not a factor with many new computer architectures. A convolution algorithm on a massively parallel processor, based on the Fermat number transform, is presented. Examples of the tradeoffs between modulus, interprocessor communication steps, and input size are given. The application of this algorithm in the multiplication of large integers is then discussed, and performance results on a Connection Machine are reported. The results show multiplication times ranging from about 50 ms for 2-kb integers to 2600 ms for 8-Mb integers.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129387226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Massively parallel auction algorithms for the assignment problem","authors":"J. Wein, S. Zenios","doi":"10.1109/FMPC.1990.89444","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89444","url":null,"abstract":"Alternative approaches to the massively parallel implementation of D.P. Bertsekas' auction algorithm (see Ann. Oper. Res., vol.14, p.105-23, 1988) on the Connection Machine CM2 are discussed. The most efficient implementation is a hybrid Jacobi/Gauss-Seidel implementation. It exploits two different levels of parallelism and an efficient way of communicating the data between them without the need to perform general router operations across the hypercube network. The implementations are evaluated empirically, solving large, dense problems.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"354 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126972147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploitation fine-grain parallelism in a combinator-based functional system","authors":"P. Chu, J. Davis","doi":"10.1109/FMPC.1990.89500","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89500","url":null,"abstract":"A Scheme to extend the lazy functional language SASL with an eager evaluation operator that allows the programmer to selectively identify expressions to be evaluated eagerly is developed. D.A. Turner's (1979) abstraction and optimization algorithms are then modified so that the eagerness information will propagate through the combinator instruction set to the run-time parallel graph reducer. Simulation of simple benchmark programs shows this method to be very effective in exploiting fine-grain parallelism, even in irregular and unstructured operation. The evaluation is done on a virtual system. Despite the distributive nature of the combinator scheme, it is still unclear how to map the virtual machine into a physical architecture efficiently without seriously degrading the performance.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129839357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Random number generators with inherent parallel properties","authors":"T. L. Yu, K. W. Yu","doi":"10.1109/FMPC.1990.89433","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89433","url":null,"abstract":"By incorporating the spatial variable into a one-dimensional array of numbers, it is possible to generalize the well-known linear congruential random-number generator (LCG) to the spatially coupled random-number generator (SCG) given by X/sub i/(t+1)=f((X/sub i/(t))) (mod m) where i=1, 2, . . ., n can be regarded as spatial sites and f is a function of (X/sub i/) that denotes a set containing X/sub i/ and its neighbors. It was found that SCGs in general possess a very long period. Statistical and spectral tests on these SCGs show that they are excellent pseudorandom-number generators. The SCGs also have inherent parallel properties and are particularly efficient when implemented on parallel machines.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131958885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework for efficient execution of array-based languages on SIMD computers","authors":"J. Prins","doi":"10.1109/FMPC.1990.89497","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89497","url":null,"abstract":"The author presents a framework for supporting efficient execution of machine-independent, array-based, data-parallel languages, such as Fortran-90 and Parallel Pascal, on distributed-memory SIMD (single-instruction-stream, multiple-data-stream) machines with mesh or hypercube interconnection topologies. The framework supports (1) a wide class of mappings of arrays into machines, (2) the implementation of many data selection and reorganization operations by manipulation of data descriptors instead of data movement, and (3) the decomposition of required data motions into sequences of efficient nearest-neighbor communications on the mesh. Each of these is discussed, and an application example is given. Related work is examined.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133377565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improved mesh algorithms for straight line detection","authors":"Y. Pan, Henry Y. H. Chuang","doi":"10.1109/FMPC.1990.89432","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89432","url":null,"abstract":"The problem of detecting lines in an image with N edge pixels on mesh-connected computers with N processors is considered. Four efficient algorithms that detect lines by performing a Hough transform are presented. The first algorithm runs in O(N/sup 1/2/+n) time on a 2-D mesh, where n is the number of theta values considered. The second algorithm runs in O((N/n)/sup 1/2/+n) time on a 3-D mesh. The third algorithm runs in O(log(N/n)+n) time on an augmented mesh. The fourth algorithm runs in O(n log N/log n) time on a mesh with a reconfigurable bus. All of the algorithms have smaller time complexities than algorithms in the literature.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134078085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A parallel architecture for high speed data compression","authors":"J. Storer, J. Reif","doi":"10.1109/FMPC.1990.89465","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89465","url":null,"abstract":"The authors discuss textural substitution methods. They present a massively parallel architecture for textural substitution that is based on a systolic pipe of 3839 identical processing elements that forms what is essentially an associative memory for strings that can learn new strings on the basis of the text processed thus far. The key to the design of this architecture is the formulation of an inherently top-down serial learning strategy as a bottom-up parallel strategy. A custom VLSI chip for this architecture that is capable of operating at 320-Mb/s has passed all simulations and is being fabricated with 1.2- mu m double-metal technology.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130007253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zicheng Guo, R. Melhem, R. W. Hall, D. Chiarulli, S. Levitan
{"title":"Array processors with pipelined optical busses","authors":"Zicheng Guo, R. Melhem, R. W. Hall, D. Chiarulli, S. Levitan","doi":"10.1109/FMPC.1990.89479","DOIUrl":"https://doi.org/10.1109/FMPC.1990.89479","url":null,"abstract":"A synchronous multiprocessor architecture based on pipelined optical bus interconnections is presented. The processors are placed in a square grid and are interconnected to one another through horizontal and vertical optical buses. This architecture has an effective diameter as small as two owing to its orthogonal bus connections, and it allows all processors to have simultaneous access to the buses owing to its capability for pipelining messages. Although the resulting architecture is meshlike and uses bus connections, it has a substantially higher bandwidth than conventional and bus-augmented mesh computers. Moreover, it has a simple control structure and is universal in that various well-known multiprocessor interconnections can be efficiently embedded in it. This architecture appears to be a good candidate for hybrid optical-electronic systems in the next generation of parallel computers.<<ETX>>","PeriodicalId":193332,"journal":{"name":"[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1990-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122409629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}