PRAM编程:理论和实践

Concurr. Pract. Exp. Pub Date : 2000-04-10 DOI:10.1002/(SICI)1096-9128(20000410)12:4%3C211::AID-CPE477%3E3.0.CO;2-R

David Lecomber, Constantinos J. Siniolakis, K. R. Sujithan

{"title":"PRAM编程:理论和实践","authors":"David Lecomber, Constantinos J. Siniolakis, K. R. Sujithan","doi":"10.1002/(SICI)1096-9128(20000410)12:4%3C211::AID-CPE477%3E3.0.CO;2-R","DOIUrl":null,"url":null,"abstract":"That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high-performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed-memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM-style shared-memory simulation on a distributed-memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. \n \n \n \nThe Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256-processor Cray T3D, an 8-node IBM SP/2 and a 4-node shared-memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct-mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"PRAM programming: in theory and in practice\",\"authors\":\"David Lecomber, Constantinos J. Siniolakis, K. R. Sujithan\",\"doi\":\"10.1002/(SICI)1096-9128(20000410)12:4%3C211::AID-CPE477%3E3.0.CO;2-R\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high-performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed-memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM-style shared-memory simulation on a distributed-memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. \\n \\n \\n \\nThe Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256-processor Cray T3D, an 8-node IBM SP/2 and a 4-node shared-memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct-mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.\",\"PeriodicalId\":199059,\"journal\":{\"name\":\"Concurr. Pract. Exp.\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Concurr. Pract. Exp.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/(SICI)1096-9128(20000410)12:4%3C211::AID-CPE477%3E3.0.CO;2-R\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurr. Pract. Exp.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/(SICI)1096-9128(20000410)12:4%3C211::AID-CPE477%3E3.0.CO;2-R","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

PRAM模型的影响在并行算法设计中无处不在，这与它在可预见的未来在技术上不可行的事实一样清楚。当前这一代并行硬件的突出特点是分布式内存和高性能互连网络——这与PRAM模型所需的共享内存截然相反。研究表明，尽管通信成本很高，但对于一些问题，从令人尴尬的并行问题到排序和数值分析，非常快速的并行算法可用于分布式内存机器。相比之下，我们知道，对于其他类型的问题，在分布式内存机器上进行pram风格的共享内存模拟，理论上可以产生与此类体系结构的最佳性能相当的解决方案。批量同步并行(BSP)模型在执行和成本模型中准确地代表了理论和实际的大多数并行机器。我们介绍了一种适用于BSP计算机的可扩展便携式PRAM实现和使用方法。我们的系统速度很快，并且是在熟悉的顺序c++语言加上新的并行计算和通信原语标准BSP库的基础上构建的。它可以移植到大量并行计算机上，包括工作站集群、256处理器的Cray T3D、8节点的IBM SP/2和4节点共享内存的SGI Power Challenge机器。我们的方法在合理的开销下实现了编程的简单性，而不是直接模式BSP编程。我们客观地比较了优化后的BSP算法和用c++ PRAM库实现的PRAM算法，并为我们的新编程风格提供了令人鼓舞的实验结果。版权所有©2000约翰威利父子有限公司

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PRAM programming: in theory and in practice

That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high-performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed-memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM-style shared-memory simulation on a distributed-memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256-processor Cray T3D, an 8-node IBM SP/2 and a 4-node shared-memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct-mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Concurr. Pract. Exp.

自引率

0.00%

发文量