Cost/performance of a parallel computer simulator

B. Falsafi, D. Wood
{"title":"Cost/performance of a parallel computer simulator","authors":"B. Falsafi, D. Wood","doi":"10.1145/182478.182587","DOIUrl":null,"url":null,"abstract":"This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallel simulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM-5. The performance model uses Kruskal and Weiss's fork-join model to account for the effect of event processing time variability on WWT's conservative fixed-window simulation algorithm. A generalization of Thiebaut and Stone's footprint model accurately predicts the effect of cache interference on the CM-5. The model is calibrated using parameters extracted from a fully-parallel simulation (p=N), and validated by measuring the speedup as the number of processors (p) ranges from one to the number of target nodes (N. Together with simple cost models, the performance model indicates that for target system sizes of 32 nodes and larger, parallel simulation is more cost-effective than sequential simulation. The key intuition behind this result is that large simulations require large memories, which dominate the cost of a uniprocessor; parallel computers allow multiple processors to simultaneously access this large memory.","PeriodicalId":194781,"journal":{"name":"Workshop on Parallel and Distributed Simulation","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Parallel and Distributed Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/182478.182587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallel simulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM-5. The performance model uses Kruskal and Weiss's fork-join model to account for the effect of event processing time variability on WWT's conservative fixed-window simulation algorithm. A generalization of Thiebaut and Stone's footprint model accurately predicts the effect of cache interference on the CM-5. The model is calibrated using parameters extracted from a fully-parallel simulation (p=N), and validated by measuring the speedup as the number of processors (p) ranges from one to the number of target nodes (N. Together with simple cost models, the performance model indicates that for target system sizes of 32 nodes and larger, parallel simulation is more cost-effective than sequential simulation. The key intuition behind this result is that large simulations require large memories, which dominate the cost of a uniprocessor; parallel computers allow multiple processors to simultaneously access this large memory.
并行计算机模拟器的成本/性能
本文研究了使用商用主机并行计算机模拟假设目标并行计算机的成本/性能。我们解决的问题是,并行仿真是否比顺序仿真更快,或者它是否也更具成本效益。为了回答这个问题,我们开发了威斯康星风洞(WWT)的性能模型,这是一个在消息传递思维机器CM-5上模拟缓存一致共享内存机器的系统。性能模型使用Kruskal和Weiss的fork-join模型来考虑事件处理时间可变性对WWT保守的固定窗口模拟算法的影响。对Thiebaut和Stone的足迹模型的推广准确地预测了缓存干扰对CM-5的影响。利用从全并行仿真(p=N)中提取的参数对模型进行了校准,并通过测量处理器数量(p)从1到目标节点数量(N)时的加速速度进行了验证。结合简单的成本模型,性能模型表明,对于32节点及更大的目标系统规模,并行仿真比顺序仿真更具成本效益。这一结果背后的关键直觉是,大型模拟需要大型内存,这是单处理器的主要成本;并行计算机允许多个处理器同时访问这个大内存。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信