PRAM programming: in theory and in practice

David Lecomber, Constantinos J. Siniolakis, K. R. Sujithan
{"title":"PRAM programming: in theory and in practice","authors":"David Lecomber, Constantinos J. Siniolakis, K. R. Sujithan","doi":"10.1002/(SICI)1096-9128(20000410)12:4%3C211::AID-CPE477%3E3.0.CO;2-R","DOIUrl":null,"url":null,"abstract":"That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high-performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed-memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM-style shared-memory simulation on a distributed-memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. \n \n \n \nThe Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256-processor Cray T3D, an 8-node IBM SP/2 and a 4-node shared-memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct-mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.","PeriodicalId":199059,"journal":{"name":"Concurr. Pract. Exp.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Concurr. Pract. Exp.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/(SICI)1096-9128(20000410)12:4%3C211::AID-CPE477%3E3.0.CO;2-R","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

That the influence of the PRAM model is ubiquitous in parallel algorithm design is as clear as the fact that it is technologically infeasible for the forseeable future. The current generation of parallel hardware prominently features distributed memory and high-performance interconnection networks—very much the antithesis of the shared memory required for the PRAM model. It has been shown that, in spite of communication costs, for some problems very fast parallel algorithms are available for distributed-memory machines—from embarassingly parallel problems to sorting and numerical analysis. In contrast it is known that for other classes of problem PRAM-style shared-memory simulation on a distributed-memory machine can, in theory, produce solutions of comparable performance to the best possible for such architectures. The Bulk Synchronous Parallel (BSP) model accurately represents most parallel machines—theoretical and actual—in an execution and cost model. We introduce a scalable portable PRAM realization appropriate for BSP computers and a methodology for usage. Our system is fast and built upon the familiar sequential C++ coupled with the new standard BSP library of parallel computation and communication primitives. It is portable to and predictable on a vast number of parallel computers including workstation clusters, a 256-processor Cray T3D, an 8-node IBM SP/2 and a 4-node shared-memory SGI Power Challenge machine. Our approach achieves simplicity of programming over direct-mode BSP programming for reasonable overhead cost. We objectively compare optimized BSP and PRAM algorithms implemented with our C++ PRAM library and provide encouraging experimental results for our new style of programming. Copyright © 2000 John Wiley & Sons, Ltd.
PRAM编程:理论和实践
PRAM模型的影响在并行算法设计中无处不在,这与它在可预见的未来在技术上不可行的事实一样清楚。当前这一代并行硬件的突出特点是分布式内存和高性能互连网络——这与PRAM模型所需的共享内存截然相反。研究表明,尽管通信成本很高,但对于一些问题,从令人尴尬的并行问题到排序和数值分析,非常快速的并行算法可用于分布式内存机器。相比之下,我们知道,对于其他类型的问题,在分布式内存机器上进行pram风格的共享内存模拟,理论上可以产生与此类体系结构的最佳性能相当的解决方案。批量同步并行(BSP)模型在执行和成本模型中准确地代表了理论和实际的大多数并行机器。我们介绍了一种适用于BSP计算机的可扩展便携式PRAM实现和使用方法。我们的系统速度很快,并且是在熟悉的顺序c++语言加上新的并行计算和通信原语标准BSP库的基础上构建的。它可以移植到大量并行计算机上,包括工作站集群、256处理器的Cray T3D、8节点的IBM SP/2和4节点共享内存的SGI Power Challenge机器。我们的方法在合理的开销下实现了编程的简单性,而不是直接模式BSP编程。我们客观地比较了优化后的BSP算法和用c++ PRAM库实现的PRAM算法,并为我们的新编程风格提供了令人鼓舞的实验结果。版权所有©2000约翰威利父子有限公司
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信