Federica Montesano, Romolo Marotta, Francesco Quaglia
{"title":"多核计算机上基于空间/时间位置的推测性离散事件仿真中的负载分担","authors":"Federica Montesano, Romolo Marotta, Francesco Quaglia","doi":"10.1145/3639703","DOIUrl":null,"url":null,"abstract":"<p>Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation systems in order to deeply exploit the main features of this type of machines. A core aspect dealt with has been the full sharing of the workload at the level of individual simulation events, which enables keeping the rollback incidence minimal. However, making each worker thread continuously switch its execution between events destined to different simulation objects does not favor locality. This problem appears even more evident in the case of Non-Uniform-Memory-Access (NUMA) machines, where memory accesses generating a cache miss to be served by a far NUMA node give rise to both higher latency and higher traffic at the level of the NUMA interconnection. In this article, we propose a workload-sharing algorithm where the worker threads can have short-term binding with specific simulation objects to favor spatial locality. The new bindings—carried out when a thread decides to switch its execution to other simulation objects—are based on both (a) the timeline according to which the object states have passed through the caching hierarchy and (b) the (dynamic) placement of objects within the NUMA architecture. At the same time, our solution still enables the worker threads to focus their activities on the events to be processed whose timestamps are closer to the simulation commit horizon—hence we exploit temporal locality along virtual time and keep the rollback incidence minimal. In our design we exploit lock-free constructs to support scalable thread synchronization while accessing the shared event pool. Furthermore, we exploit a multi-view approach of the event pool content, which additionally favors local accesses to the parts of the event pool that are currently relevant for the thread activity. Our solution has been released as an integration within the USE (Ultimate-Share-Everything) open source speculative simulation platform available to the community. Furthermore, in this article we report the results of an experimental study that shows the effectiveness of our proposal.</p>","PeriodicalId":50943,"journal":{"name":"ACM Transactions on Modeling and Computer Simulation","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines\",\"authors\":\"Federica Montesano, Romolo Marotta, Francesco Quaglia\",\"doi\":\"10.1145/3639703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation systems in order to deeply exploit the main features of this type of machines. A core aspect dealt with has been the full sharing of the workload at the level of individual simulation events, which enables keeping the rollback incidence minimal. However, making each worker thread continuously switch its execution between events destined to different simulation objects does not favor locality. This problem appears even more evident in the case of Non-Uniform-Memory-Access (NUMA) machines, where memory accesses generating a cache miss to be served by a far NUMA node give rise to both higher latency and higher traffic at the level of the NUMA interconnection. In this article, we propose a workload-sharing algorithm where the worker threads can have short-term binding with specific simulation objects to favor spatial locality. The new bindings—carried out when a thread decides to switch its execution to other simulation objects—are based on both (a) the timeline according to which the object states have passed through the caching hierarchy and (b) the (dynamic) placement of objects within the NUMA architecture. At the same time, our solution still enables the worker threads to focus their activities on the events to be processed whose timestamps are closer to the simulation commit horizon—hence we exploit temporal locality along virtual time and keep the rollback incidence minimal. In our design we exploit lock-free constructs to support scalable thread synchronization while accessing the shared event pool. Furthermore, we exploit a multi-view approach of the event pool content, which additionally favors local accesses to the parts of the event pool that are currently relevant for the thread activity. Our solution has been released as an integration within the USE (Ultimate-Share-Everything) open source speculative simulation platform available to the community. Furthermore, in this article we report the results of an experimental study that shows the effectiveness of our proposal.</p>\",\"PeriodicalId\":50943,\"journal\":{\"name\":\"ACM Transactions on Modeling and Computer Simulation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2024-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Modeling and Computer Simulation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3639703\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Modeling and Computer Simulation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3639703","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
共享内存多处理器/多核机器已成为许多应用环境的参考。特别是最近关于投机并行离散事件仿真的文献,重新调整了仿真系统的架构组织,以便深入利用这类机器的主要特点。其中涉及的一个核心问题是在单个仿真事件的层面上完全分担工作量,从而将回滚发生率降至最低。然而,让每个工作线程在不同仿真对象的事件之间不断切换执行并不利于本地化。在非统一内存访问(NUMA)机器上,这个问题显得更加明显,因为内存访问会产生缓存缺失,需要由较远的 NUMA 节点提供服务,从而导致 NUMA 互联层面的延迟和流量增加。在本文中,我们提出了一种工作负载分担算法,在这种算法中,工作线程可以与特定的仿真对象进行短期绑定,以提高空间位置性。新的绑定--当线程决定将其执行切换到其他仿真对象时进行--基于(a)对象状态通过缓存层级的时间轴和(b)对象在 NUMA 架构中的(动态)位置。同时,我们的解决方案还能让工作线程将其活动集中在时间戳更接近仿真提交范围的待处理事件上,因此我们利用了虚拟时间的时间定位性,并将回滚发生率保持在最低水平。在我们的设计中,我们利用无锁结构来支持可扩展的线程同步,同时访问共享事件池。此外,我们还采用了多视角的事件池内容方法,这更有利于本地访问当前与线程活动相关的事件池部分。我们的解决方案已作为 USE(Ultimate-Share-Everything)开源投机模拟平台的集成发布,供社区使用。此外,我们还在本文中报告了一项实验研究的结果,显示了我们建议的有效性。
Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines
Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation systems in order to deeply exploit the main features of this type of machines. A core aspect dealt with has been the full sharing of the workload at the level of individual simulation events, which enables keeping the rollback incidence minimal. However, making each worker thread continuously switch its execution between events destined to different simulation objects does not favor locality. This problem appears even more evident in the case of Non-Uniform-Memory-Access (NUMA) machines, where memory accesses generating a cache miss to be served by a far NUMA node give rise to both higher latency and higher traffic at the level of the NUMA interconnection. In this article, we propose a workload-sharing algorithm where the worker threads can have short-term binding with specific simulation objects to favor spatial locality. The new bindings—carried out when a thread decides to switch its execution to other simulation objects—are based on both (a) the timeline according to which the object states have passed through the caching hierarchy and (b) the (dynamic) placement of objects within the NUMA architecture. At the same time, our solution still enables the worker threads to focus their activities on the events to be processed whose timestamps are closer to the simulation commit horizon—hence we exploit temporal locality along virtual time and keep the rollback incidence minimal. In our design we exploit lock-free constructs to support scalable thread synchronization while accessing the shared event pool. Furthermore, we exploit a multi-view approach of the event pool content, which additionally favors local accesses to the parts of the event pool that are currently relevant for the thread activity. Our solution has been released as an integration within the USE (Ultimate-Share-Everything) open source speculative simulation platform available to the community. Furthermore, in this article we report the results of an experimental study that shows the effectiveness of our proposal.
期刊介绍:
The ACM Transactions on Modeling and Computer Simulation (TOMACS) provides a single archival source for the publication of high-quality research and developmental results referring to all phases of the modeling and simulation life cycle. The subjects of emphasis are discrete event simulation, combined discrete and continuous simulation, as well as Monte Carlo methods.
The use of simulation techniques is pervasive, extending to virtually all the sciences. TOMACS serves to enhance the understanding, improve the practice, and increase the utilization of computer simulation. Submissions should contribute to the realization of these objectives, and papers treating applications should stress their contributions vis-á-vis these objectives.