Cost/performance of a parallel computer simulator

Workshop on Parallel and Distributed Simulation Pub Date : 1994-08-01 DOI:10.1145/182478.182587

B. Falsafi, D. Wood

{"title":"Cost/performance of a parallel computer simulator","authors":"B. Falsafi, D. Wood","doi":"10.1145/182478.182587","DOIUrl":null,"url":null,"abstract":"This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallel simulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM-5. The performance model uses Kruskal and Weiss's fork-join model to account for the effect of event processing time variability on WWT's conservative fixed-window simulation algorithm. A generalization of Thiebaut and Stone's footprint model accurately predicts the effect of cache interference on the CM-5. The model is calibrated using parameters extracted from a fully-parallel simulation (p=N), and validated by measuring the speedup as the number of processors (p) ranges from one to the number of target nodes (N. Together with simple cost models, the performance model indicates that for target system sizes of 32 nodes and larger, parallel simulation is more cost-effective than sequential simulation. The key intuition behind this result is that large simulations require large memories, which dominate the cost of a uniprocessor; parallel computers allow multiple processors to simultaneously access this large memory.","PeriodicalId":194781,"journal":{"name":"Workshop on Parallel and Distributed Simulation","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1994-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Parallel and Distributed Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/182478.182587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

Abstract

This paper examines the cost/performance of simulating a hypothetical target parallel computer using a commercial host parallel computer. We address the question of whether parallel simulation is simply faster than sequential simulation, or if it is also more cost-effective. To answer this, we develop a performance model of the Wisconsin Wind Tunnel (WWT), a system that simulates cache-coherent shared-memory machines on a message-passing Thinking Machines CM-5. The performance model uses Kruskal and Weiss's fork-join model to account for the effect of event processing time variability on WWT's conservative fixed-window simulation algorithm. A generalization of Thiebaut and Stone's footprint model accurately predicts the effect of cache interference on the CM-5. The model is calibrated using parameters extracted from a fully-parallel simulation (p=N), and validated by measuring the speedup as the number of processors (p) ranges from one to the number of target nodes (N. Together with simple cost models, the performance model indicates that for target system sizes of 32 nodes and larger, parallel simulation is more cost-effective than sequential simulation. The key intuition behind this result is that large simulations require large memories, which dominate the cost of a uniprocessor; parallel computers allow multiple processors to simultaneously access this large memory.

查看原文本刊更多论文

并行计算机模拟器的成本/性能

本文研究了使用商用主机并行计算机模拟假设目标并行计算机的成本/性能。我们解决的问题是，并行仿真是否比顺序仿真更快，或者它是否也更具成本效益。为了回答这个问题，我们开发了威斯康星风洞(WWT)的性能模型，这是一个在消息传递思维机器CM-5上模拟缓存一致共享内存机器的系统。性能模型使用Kruskal和Weiss的fork-join模型来考虑事件处理时间可变性对WWT保守的固定窗口模拟算法的影响。对Thiebaut和Stone的足迹模型的推广准确地预测了缓存干扰对CM-5的影响。利用从全并行仿真(p=N)中提取的参数对模型进行了校准，并通过测量处理器数量(p)从1到目标节点数量(N)时的加速速度进行了验证。结合简单的成本模型，性能模型表明，对于32节点及更大的目标系统规模，并行仿真比顺序仿真更具成本效益。这一结果背后的关键直觉是，大型模拟需要大型内存，这是单处理器的主要成本;并行计算机允许多个处理器同时访问这个大内存。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Workshop on Parallel and Distributed Simulation

自引率

0.00%

发文量