Development of Parallel Methods for a $1024$-Processor Hypercube

Siam Journal on Scientific and Statistical Computing Pub Date : 1988-07-01 DOI:10.1137/0909041

J. Gustafson, G. Montry, R. Benner

{"title":"Development of Parallel Methods for a $1024$-Processor Hypercube","authors":"J. Gustafson, G. Montry, R. Benner","doi":"10.1137/0909041","DOIUrl":null,"url":null,"abstract":"We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"456","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Siam Journal on Scientific and Statistical Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/0909041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 456

Abstract

We have developed highly efficient parallel solutions for three practical, full-scale scientific problems: wave mechanics, fluid dynamics, and structural analysis. Several algorithmic techniques are used to keep communication and serial overhead small as both problem size and number of processors are varied. A new parameter, operation efficiency, is introduced that quantifies the tradeoff between communication and redundant computation. A 1024-processor MIMD ensemble is measured to be 502 to 637 times as fast as a single processor when problem size for the ensemble is fixed, and 1009 to 1020 times as fast as a single processor when problem size per processor is fixed. The latter measure, denoted scaled speedup, is developed and contrasted with the traditional measure of parallel speedup. The scaled-problem paradigm better reveals the capabilities of large ensembles, and permits detection of subtle hardware-induced load imbalances (such as error correction and data-dependent MFLOPS rates) that may become increasingly important as parallel processors increase in node count. Sustained performance for the applications is 70 to 130 MFLOPS, validating the massively parallel ensemble approach as a practical alternative to more conventional processing methods. The techniques presented appear extensible to even higher levels of parallelism than the 1024-processor level explored here.

查看原文本刊更多论文

$1024$处理器超立方体并行方法的发展

我们为三个实际的、全面的科学问题开发了高效的并行解决方案:波动力学、流体动力学和结构分析。随着问题大小和处理器数量的变化，使用了几种算法技术来保持通信和串行开销较小。引入了一个新的参数——操作效率，用来量化通信和冗余计算之间的权衡。据测量，当集成的问题大小固定时，1024处理器的MIMD集成的速度是单个处理器的502到637倍，当每个处理器的问题大小固定时，速度是单个处理器的1009到1020倍。将后一种方法称为比例加速，并与传统的并行加速方法进行了对比。缩放问题范例更好地揭示了大型集成的能力，并允许检测细微的硬件引起的负载不平衡(如纠错和数据相关的MFLOPS速率)，随着并行处理器节点数量的增加，这种不平衡可能变得越来越重要。应用程序的持续性能为70到130 MFLOPS，验证了大规模并行集成方法作为更传统处理方法的实用替代方案。所提供的技术似乎可以扩展到比这里讨论的1024处理器级别更高的并行性级别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Siam Journal on Scientific and Statistical Computing

自引率

0.00%

发文量