P. Barnes, C. Carothers, D. Jefferson, Justin M. LaPre
{"title":"翘曲速度:在1,966,080核上执行时间翘曲","authors":"P. Barnes, C. Carothers, D. Jefferson, Justin M. LaPre","doi":"10.1145/2486092.2486134","DOIUrl":null,"url":null,"abstract":"Time Warp is an optimistic synchronization protocol for parallel discrete event simulation that coordinates the available parallelism through its rollback and antimessage mechanisms. In this paper we present the results of a strong scaling study of the ROSS simulator running Time Warp with reverse computation and executing the well-known PHOLD benchmark on Lawrence Livermore National Laboratory's Sequoia Blue Gene/Q supercomputer. The benchmark has 251 million PHOLD logical processes and was executed in several configurations up to a peak of 7.86 million MPI tasks running on 1,966,080 cores. At the largest scale it processed 33 trillion events in 65 seconds, yielding a sustained speed of 504 billion events/second using 120 racks of Sequoia. This is by far the highest event rate reported by any parallel discrete event simulation to date, whether running PHOLD or any other benchmark. Additionally, we believe it is likely to be the largest number of MPI tasks ever used in any computation of any kind to date. ROSS exhibited a super-linear speedup throughout the strong scaling study, with more than a 97x speed improvement from scaling the number of cores by only 60x (from 32,768 to 1,966,080). We attribute this to significant cache-related performance acceleration as we moved to higher scales with fewer LPs per core. Prompted by historical performance results we propose a new, long term performance metric called Warp Speed that grows logarithmically with the PHOLD event rate. As we define it our maximum speed of 504 billion PHOLD events/sec corresponds to Warp 2.7. We suggest that the results described here are significant because they demonstrate that direct simulation of planetary-scale discrete event models are now, in principle at least, within reach.","PeriodicalId":115341,"journal":{"name":"Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"114","resultStr":"{\"title\":\"Warp speed: executing time warp on 1,966,080 cores\",\"authors\":\"P. Barnes, C. Carothers, D. Jefferson, Justin M. LaPre\",\"doi\":\"10.1145/2486092.2486134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Time Warp is an optimistic synchronization protocol for parallel discrete event simulation that coordinates the available parallelism through its rollback and antimessage mechanisms. In this paper we present the results of a strong scaling study of the ROSS simulator running Time Warp with reverse computation and executing the well-known PHOLD benchmark on Lawrence Livermore National Laboratory's Sequoia Blue Gene/Q supercomputer. The benchmark has 251 million PHOLD logical processes and was executed in several configurations up to a peak of 7.86 million MPI tasks running on 1,966,080 cores. At the largest scale it processed 33 trillion events in 65 seconds, yielding a sustained speed of 504 billion events/second using 120 racks of Sequoia. This is by far the highest event rate reported by any parallel discrete event simulation to date, whether running PHOLD or any other benchmark. Additionally, we believe it is likely to be the largest number of MPI tasks ever used in any computation of any kind to date. ROSS exhibited a super-linear speedup throughout the strong scaling study, with more than a 97x speed improvement from scaling the number of cores by only 60x (from 32,768 to 1,966,080). We attribute this to significant cache-related performance acceleration as we moved to higher scales with fewer LPs per core. Prompted by historical performance results we propose a new, long term performance metric called Warp Speed that grows logarithmically with the PHOLD event rate. As we define it our maximum speed of 504 billion PHOLD events/sec corresponds to Warp 2.7. We suggest that the results described here are significant because they demonstrate that direct simulation of planetary-scale discrete event models are now, in principle at least, within reach.\",\"PeriodicalId\":115341,\"journal\":{\"name\":\"Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"114\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2486092.2486134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2486092.2486134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 114
摘要
Time Warp是一种用于并行离散事件模拟的乐观同步协议,它通过回滚和反消息机制来协调可用的并行性。在本文中,我们介绍了在劳伦斯利弗莫尔国家实验室的红杉蓝基因/Q超级计算机上运行时间扭曲的ROSS模拟器与反向计算和执行著名的PHOLD基准的强尺度研究结果。该基准测试有2.51亿个PHOLD逻辑进程,并在几种配置中执行,最高可达786万个MPI任务,运行在1,966,080个内核上。在最大规模的情况下,它在65秒内处理了33万亿个事件,使用120个Sequoia机架的持续速度为5040亿个事件/秒。这是迄今为止任何并行离散事件模拟报告的最高事件率,无论是运行hold还是任何其他基准测试。此外,我们认为这可能是迄今为止在任何类型的计算中使用的MPI任务数量最多的一次。在整个强缩放研究中,ROSS表现出了超线性的加速,将内核数量仅缩放60倍(从32,768到1,966,080),速度提高了97倍以上。我们将此归因于随着我们迁移到每个核更少lp的更高规模,与缓存相关的显著性能加速。根据历史性能结果,我们提出了一个新的长期性能指标,称为Warp Speed,它随着hold事件率呈对数增长。根据我们的定义,5040亿hold事件/秒的最大速度对应于Warp 2.7。我们认为,这里描述的结果是重要的,因为它们表明,直接模拟行星尺度的离散事件模型,至少在原则上,现在是可以实现的。
Warp speed: executing time warp on 1,966,080 cores
Time Warp is an optimistic synchronization protocol for parallel discrete event simulation that coordinates the available parallelism through its rollback and antimessage mechanisms. In this paper we present the results of a strong scaling study of the ROSS simulator running Time Warp with reverse computation and executing the well-known PHOLD benchmark on Lawrence Livermore National Laboratory's Sequoia Blue Gene/Q supercomputer. The benchmark has 251 million PHOLD logical processes and was executed in several configurations up to a peak of 7.86 million MPI tasks running on 1,966,080 cores. At the largest scale it processed 33 trillion events in 65 seconds, yielding a sustained speed of 504 billion events/second using 120 racks of Sequoia. This is by far the highest event rate reported by any parallel discrete event simulation to date, whether running PHOLD or any other benchmark. Additionally, we believe it is likely to be the largest number of MPI tasks ever used in any computation of any kind to date. ROSS exhibited a super-linear speedup throughout the strong scaling study, with more than a 97x speed improvement from scaling the number of cores by only 60x (from 32,768 to 1,966,080). We attribute this to significant cache-related performance acceleration as we moved to higher scales with fewer LPs per core. Prompted by historical performance results we propose a new, long term performance metric called Warp Speed that grows logarithmically with the PHOLD event rate. As we define it our maximum speed of 504 billion PHOLD events/sec corresponds to Warp 2.7. We suggest that the results described here are significant because they demonstrate that direct simulation of planetary-scale discrete event models are now, in principle at least, within reach.