Dynamically Adjusting Core Frequencies to Accelerate Time Warp Simulations in Many-Core Processors

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation Pub Date : 2012-07-15 DOI:10.1109/PADS.2012.15

Ryan Child, P. Wilsey

{"title":"Dynamically Adjusting Core Frequencies to Accelerate Time Warp Simulations in Many-Core Processors","authors":"Ryan Child, P. Wilsey","doi":"10.1109/PADS.2012.15","DOIUrl":null,"url":null,"abstract":"Time Warp synchronized parallel discrete event simulators are organized to operate asynchronously and aggressively without explicit synchronization between the concurrently executing simulators. In place of an explicit synchronization mechanism, the concurrent simulators maintain a common virtual clock model and implement a rollback/recovery mechanism to restore causal order when out-of-order events are detected. When the critical path of execution of the simulation is balanced across these parallel simulators, this can result in a highly effective, lightweight synchronization mechanism. However, imbalances in the workload across the parallel simulators can result in excessive rollback at some nodes and ultimately result in an overall slowing of the simulation as prematurely computed and transmitted events are processed. On small shared memory multi-core systems, a lowest time-stamp first scheduling policy can effectively balance the workload. However, on larger many-core chips, conventional load balancing and workload migration will once again become necessary. Fortunately, emerging many-core chips contain some interesting features that can potentially be exploited to improve the performance of parallel simulations. For example, the Intel Single-chip Cloud Computer (SCC) provides mechanisms that a running application can use to adjust the frequency/voltage of different regions (called islands) of the chip. These islands are network and processing core centric and thus, in a Time Warp simulation, one can increase the frequency of the cores executing threads on the critical path (those experiencing infrequent rollback) and decrease the frequency of the cores executing threads off the critical path (those experiencing excessive rollback). This paper investigates the run-time control and adjustment of core frequency in an AMD Phenom II X6 multi-core processor to explore and demonstrate that the dynamic run-time control of core frequency can sometimes improve the performance of a Time Warp synchronized parallel simulation.","PeriodicalId":299627,"journal":{"name":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PADS.2012.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Time Warp synchronized parallel discrete event simulators are organized to operate asynchronously and aggressively without explicit synchronization between the concurrently executing simulators. In place of an explicit synchronization mechanism, the concurrent simulators maintain a common virtual clock model and implement a rollback/recovery mechanism to restore causal order when out-of-order events are detected. When the critical path of execution of the simulation is balanced across these parallel simulators, this can result in a highly effective, lightweight synchronization mechanism. However, imbalances in the workload across the parallel simulators can result in excessive rollback at some nodes and ultimately result in an overall slowing of the simulation as prematurely computed and transmitted events are processed. On small shared memory multi-core systems, a lowest time-stamp first scheduling policy can effectively balance the workload. However, on larger many-core chips, conventional load balancing and workload migration will once again become necessary. Fortunately, emerging many-core chips contain some interesting features that can potentially be exploited to improve the performance of parallel simulations. For example, the Intel Single-chip Cloud Computer (SCC) provides mechanisms that a running application can use to adjust the frequency/voltage of different regions (called islands) of the chip. These islands are network and processing core centric and thus, in a Time Warp simulation, one can increase the frequency of the cores executing threads on the critical path (those experiencing infrequent rollback) and decrease the frequency of the cores executing threads off the critical path (those experiencing excessive rollback). This paper investigates the run-time control and adjustment of core frequency in an AMD Phenom II X6 multi-core processor to explore and demonstrate that the dynamic run-time control of core frequency can sometimes improve the performance of a Time Warp synchronized parallel simulation.

查看原文本刊更多论文

动态调整核心频率以加速多核处理器中的时间扭曲模拟

时间扭曲同步并行离散事件模拟器被组织为异步和积极地运行，而在并发执行的模拟器之间没有显式的同步。并发模拟器维护一个公共的虚拟时钟模型，并实现一个回滚/恢复机制，以代替显式的同步机制，以便在检测到无序事件时恢复因果顺序。当模拟执行的关键路径在这些并行模拟器之间得到平衡时，这可以产生高效、轻量级的同步机制。然而，跨并行模拟器的工作负载的不平衡可能导致某些节点上的过度回滚，并最终导致模拟的整体速度变慢，因为过早地计算和传输事件被处理。在小型共享内存多核系统上，最低时间戳优先的调度策略可以有效地平衡工作负载。然而，在更大的多核芯片上，传统的负载平衡和工作负载迁移将再次成为必要。幸运的是，新兴的多核芯片包含一些有趣的特性，可以潜在地用于提高并行模拟的性能。例如，英特尔单芯片云计算机(SCC)提供了运行中的应用程序可以使用的机制来调整芯片不同区域(称为孤岛)的频率/电压。这些孤岛以网络和处理核心为中心，因此，在Time Warp模拟中，可以增加在关键路径上执行线程的核心(那些经历不频繁回滚的内核)的频率，并减少在关键路径外执行线程的核心(那些经历过度回滚的内核)的频率。本文研究了AMD飞鸿II X6多核处理器的运行时控制和内核频率的调整，以探索和证明内核频率的动态运行时控制有时可以提高时间扭曲同步并行仿真的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation

自引率

0.00%

发文量