129K+处理器核上的GVT算法和离散事件动力学

2011 18th International Conference on High Performance Computing Pub Date : 2011-12-18 DOI:10.1109/HiPC.2011.6152725

K. Perumalla, Alfred Park, V. Tipparaju

{"title":"129K+处理器核上的GVT算法和离散事件动力学","authors":"K. Perumalla, Alfred Park, V. Tipparaju","doi":"10.1109/HiPC.2011.6152725","DOIUrl":null,"url":null,"abstract":"Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors due to tight global timestamp-ordering and fine-grained event execution. One of the critical factors in scaling PDES is the efficiency of the underlying global virtual time (GVT) algorithm needed for correctness of parallel execution and speed of progress. Although many GVT algorithms have been proposed previously, few have been proposed for scalable asynchronous execution and none customized to exploit one-sided communication. Moreover, the detailed performance effects of actual GVT algorithm implementations on large platforms are unknown. Here, three major GVT algorithms intended for scalable execution on high-performance systems are studied: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm, proposed and studied for the first time here, to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 129,024 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine-to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance to the tune of tens of billions of events executed per second are registered, exceeding the speeds of any known PDES engine, and showing asynchronous GVT algorithms to outperform state-of-the-art synchronous GVT algorithms. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.","PeriodicalId":122468,"journal":{"name":"2011 18th International Conference on High Performance Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"GVT algorithms and discrete event dynamics on 129K+ processor cores\",\"authors\":\"K. Perumalla, Alfred Park, V. Tipparaju\",\"doi\":\"10.1109/HiPC.2011.6152725\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors due to tight global timestamp-ordering and fine-grained event execution. One of the critical factors in scaling PDES is the efficiency of the underlying global virtual time (GVT) algorithm needed for correctness of parallel execution and speed of progress. Although many GVT algorithms have been proposed previously, few have been proposed for scalable asynchronous execution and none customized to exploit one-sided communication. Moreover, the detailed performance effects of actual GVT algorithm implementations on large platforms are unknown. Here, three major GVT algorithms intended for scalable execution on high-performance systems are studied: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm, proposed and studied for the first time here, to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 129,024 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine-to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance to the tune of tens of billions of events executed per second are registered, exceeding the speeds of any known PDES engine, and showing asynchronous GVT algorithms to outperform state-of-the-art synchronous GVT algorithms. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.\",\"PeriodicalId\":122468,\"journal\":{\"name\":\"2011 18th International Conference on High Performance Computing\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 18th International Conference on High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2011.6152725\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2011.6152725","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

并行离散事件模拟(PDES)代表一类代码，由于严格的全局时间戳排序和细粒度事件执行，它们很难扩展到大量处理器。扩展PDES的关键因素之一是并行执行正确性和进度所需的底层全局虚拟时间(GVT)算法的效率。虽然以前已经提出了许多GVT算法，但很少有针对可扩展异步执行的算法，也没有针对单边通信进行定制的算法。此外，实际GVT算法在大型平台上实现的详细性能影响是未知的。本文研究了在高性能系统上可扩展执行的三种主要GVT算法:(1)易于实现的同步GVT算法;(2)实现更复杂但可以减轻阻塞延迟的异步GVT算法;(3)本文首次提出和研究的异步GVT算法的变体，以利用现有超级计算平台上的单边通信。本文给出了这些算法在多达129,024个Cray XT5系统内核上的实现的性能结果，并对一系列参数进行了测试:乐观和保守同步、细粒度到中粒度事件计算、合成和非合成应用程序以及不同的前瞻性值。每秒可执行数百亿个事件，超过了任何已知的PDES引擎的速度，并且显示异步GVT算法优于最先进的同步GVT算法。提出了详细的pdes特定运行时指标，以进一步理解大规模并行平台上的紧密耦合离散事件动力学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GVT algorithms and discrete event dynamics on 129K+ processor cores

Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors due to tight global timestamp-ordering and fine-grained event execution. One of the critical factors in scaling PDES is the efficiency of the underlying global virtual time (GVT) algorithm needed for correctness of parallel execution and speed of progress. Although many GVT algorithms have been proposed previously, few have been proposed for scalable asynchronous execution and none customized to exploit one-sided communication. Moreover, the detailed performance effects of actual GVT algorithm implementations on large platforms are unknown. Here, three major GVT algorithms intended for scalable execution on high-performance systems are studied: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm, proposed and studied for the first time here, to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 129,024 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine-to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance to the tune of tens of billions of events executed per second are registered, exceeding the speeds of any known PDES engine, and showing asynchronous GVT algorithms to outperform state-of-the-art synchronous GVT algorithms. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 18th International Conference on High Performance Computing

自引率

0.00%

发文量