Development of Parallel Methods for a $1024$-Processor Hypercube

J. Gustafson, G. Montry, R. Benner
{"title":"Development of Parallel Methods for a $1024$-Processor Hypercube","authors":"J. Gustafson, G. Montry, R. Benner","doi":"10.1137/0909041","DOIUrl":null,"url":null,"abstract":"We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"456","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Siam Journal on Scientific and Statistical Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/0909041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 456

Abstract

We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.
$1024$处理器超立方体并行方法的发展
我们为三个实际的、全面的科学问题开发了高效的并行解决方案:波动力学、流体动力学和结构分析。随着问题大小和处理器数量的变化,使用了几种算法技术来保持通信和串行开销较小。引入了一个新的参数——操作效率,用来量化通信和冗余计算之间的权衡。据测量,当集成的问题大小固定时,1024处理器的MIMD集成的速度是单个处理器的502到637倍,当每个处理器的问题大小固定时,速度是单个处理器的1009到1020倍。将后一种方法称为比例加速,并与传统的并行加速方法进行了对比。缩放问题范例更好地揭示了大型集成的能力,并允许检测细微的硬件引起的负载不平衡(如纠错和数据相关的MFLOPS速率),随着并行处理器节点数量的增加,这种不平衡可能变得越来越重要。应用程序的持续性能为70到130 MFLOPS,验证了大规模并行集成方法作为更传统处理方法的实用替代方案。所提供的技术似乎可以扩展到比这里讨论的1024处理器级别更高的并行性级别。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信