Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines

IF 0.7 4区 计算机科学 Q4 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Federica Montesano, Romolo Marotta, Francesco Quaglia
{"title":"Spatial/Temporal Locality-based Load-sharing in Speculative Discrete Event Simulation on Multi-core Machines","authors":"Federica Montesano, Romolo Marotta, Francesco Quaglia","doi":"10.1145/3639703","DOIUrl":null,"url":null,"abstract":"<p>Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation systems in order to deeply exploit the main features of this type of machines. A core aspect dealt with has been the full sharing of the workload at the level of individual simulation events, which enables keeping the rollback incidence minimal. However, making each worker thread continuously switch its execution between events destined to different simulation objects does not favor locality. This problem appears even more evident in the case of Non-Uniform-Memory-Access (NUMA) machines, where memory accesses generating a cache miss to be served by a far NUMA node give rise to both higher latency and higher traffic at the level of the NUMA interconnection. In this article, we propose a workload-sharing algorithm where the worker threads can have short-term binding with specific simulation objects to favor spatial locality. The new bindings—carried out when a thread decides to switch its execution to other simulation objects—are based on both (a) the timeline according to which the object states have passed through the caching hierarchy and (b) the (dynamic) placement of objects within the NUMA architecture. At the same time, our solution still enables the worker threads to focus their activities on the events to be processed whose timestamps are closer to the simulation commit horizon—hence we exploit temporal locality along virtual time and keep the rollback incidence minimal. In our design we exploit lock-free constructs to support scalable thread synchronization while accessing the shared event pool. Furthermore, we exploit a multi-view approach of the event pool content, which additionally favors local accesses to the parts of the event pool that are currently relevant for the thread activity. Our solution has been released as an integration within the USE (Ultimate-Share-Everything) open source speculative simulation platform available to the community. Furthermore, in this article we report the results of an experimental study that shows the effectiveness of our proposal.</p>","PeriodicalId":50943,"journal":{"name":"ACM Transactions on Modeling and Computer Simulation","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Modeling and Computer Simulation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3639703","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Shared-memory multi-processor/multi-core machines have become a reference for many application contexts. In particular, the recent literature on speculative parallel discrete event simulation has reshuffled the architectural organization of simulation systems in order to deeply exploit the main features of this type of machines. A core aspect dealt with has been the full sharing of the workload at the level of individual simulation events, which enables keeping the rollback incidence minimal. However, making each worker thread continuously switch its execution between events destined to different simulation objects does not favor locality. This problem appears even more evident in the case of Non-Uniform-Memory-Access (NUMA) machines, where memory accesses generating a cache miss to be served by a far NUMA node give rise to both higher latency and higher traffic at the level of the NUMA interconnection. In this article, we propose a workload-sharing algorithm where the worker threads can have short-term binding with specific simulation objects to favor spatial locality. The new bindings—carried out when a thread decides to switch its execution to other simulation objects—are based on both (a) the timeline according to which the object states have passed through the caching hierarchy and (b) the (dynamic) placement of objects within the NUMA architecture. At the same time, our solution still enables the worker threads to focus their activities on the events to be processed whose timestamps are closer to the simulation commit horizon—hence we exploit temporal locality along virtual time and keep the rollback incidence minimal. In our design we exploit lock-free constructs to support scalable thread synchronization while accessing the shared event pool. Furthermore, we exploit a multi-view approach of the event pool content, which additionally favors local accesses to the parts of the event pool that are currently relevant for the thread activity. Our solution has been released as an integration within the USE (Ultimate-Share-Everything) open source speculative simulation platform available to the community. Furthermore, in this article we report the results of an experimental study that shows the effectiveness of our proposal.

多核计算机上基于空间/时间位置的推测性离散事件仿真中的负载分担
共享内存多处理器/多核机器已成为许多应用环境的参考。特别是最近关于投机并行离散事件仿真的文献,重新调整了仿真系统的架构组织,以便深入利用这类机器的主要特点。其中涉及的一个核心问题是在单个仿真事件的层面上完全分担工作量,从而将回滚发生率降至最低。然而,让每个工作线程在不同仿真对象的事件之间不断切换执行并不利于本地化。在非统一内存访问(NUMA)机器上,这个问题显得更加明显,因为内存访问会产生缓存缺失,需要由较远的 NUMA 节点提供服务,从而导致 NUMA 互联层面的延迟和流量增加。在本文中,我们提出了一种工作负载分担算法,在这种算法中,工作线程可以与特定的仿真对象进行短期绑定,以提高空间位置性。新的绑定--当线程决定将其执行切换到其他仿真对象时进行--基于(a)对象状态通过缓存层级的时间轴和(b)对象在 NUMA 架构中的(动态)位置。同时,我们的解决方案还能让工作线程将其活动集中在时间戳更接近仿真提交范围的待处理事件上,因此我们利用了虚拟时间的时间定位性,并将回滚发生率保持在最低水平。在我们的设计中,我们利用无锁结构来支持可扩展的线程同步,同时访问共享事件池。此外,我们还采用了多视角的事件池内容方法,这更有利于本地访问当前与线程活动相关的事件池部分。我们的解决方案已作为 USE(Ultimate-Share-Everything)开源投机模拟平台的集成发布,供社区使用。此外,我们还在本文中报告了一项实验研究的结果,显示了我们建议的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation 工程技术-计算机:跨学科应用
CiteScore
2.50
自引率
22.20%
发文量
29
审稿时长
>12 weeks
期刊介绍: The ACM Transactions on Modeling and Computer Simulation (TOMACS) provides a single archival source for the publication of high-quality research and developmental results referring to all phases of the modeling and simulation life cycle. The subjects of emphasis are discrete event simulation, combined discrete and continuous simulation, as well as Monte Carlo methods. The use of simulation techniques is pervasive, extending to virtually all the sciences. TOMACS serves to enhance the understanding, improve the practice, and increase the utilization of computer simulation. Submissions should contribute to the realization of these objectives, and papers treating applications should stress their contributions vis-á-vis these objectives.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信