NUMA-Aware Non-Blocking Calendar Queue

2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (DS-RT) Pub Date : 2020-09-01 DOI:10.1109/DS-RT50469.2020.9213639

Maryan Rab, Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, F. Quaglia

{"title":"NUMA-Aware Non-Blocking Calendar Queue","authors":"Maryan Rab, Romolo Marotta, Mauro Ianni, Alessandro Pellegrini, F. Quaglia","doi":"10.1109/DS-RT50469.2020.9213639","DOIUrl":null,"url":null,"abstract":"Modern computing platforms are based on multi-processor/multi-core technology. This allows running applications with a high degree of hardware parallelism. However, medium-to-high end machines pose a problem related to the asymmetric delays threads experience when accessing shared data. Specifically, Non-Uniform-Memory-Access (NUMA) is the dominating technology—thanks to its capability for scaled-up memory bandwidth—which however imposes asymmetric distances between CPU-cores and memory banks, making an access by a thread to data placed on a far NUMA node severely impacting performance. In this article, we tackle this problem in the context of shared event-pool management, a relevant aspect in many fields, like parallel discrete event simulation. Specifically, we present a NUMA-aware calendar queue, which also has the advantage of making concurrent threads coordinate via a non-blocking scalable approach. Our proposal is based on work deferring combined with dynamic re-binding of the calendar queue operations (insertions/extractions) to the best suited among the concurrent threads hosted by the underlying computing platform. This changes the locality of the operations by threads in a way positively reflected onto NUMA tasks at the hardware level. We report the results of an experimental study, demonstrating the capability of our solution to achieve the order of 15% better performance compared to state-of-the-art solutions already suited for multicore environments.","PeriodicalId":149260,"journal":{"name":"2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (DS-RT)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (DS-RT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DS-RT50469.2020.9213639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Modern computing platforms are based on multi-processor/multi-core technology. This allows running applications with a high degree of hardware parallelism. However, medium-to-high end machines pose a problem related to the asymmetric delays threads experience when accessing shared data. Specifically, Non-Uniform-Memory-Access (NUMA) is the dominating technology—thanks to its capability for scaled-up memory bandwidth—which however imposes asymmetric distances between CPU-cores and memory banks, making an access by a thread to data placed on a far NUMA node severely impacting performance. In this article, we tackle this problem in the context of shared event-pool management, a relevant aspect in many fields, like parallel discrete event simulation. Specifically, we present a NUMA-aware calendar queue, which also has the advantage of making concurrent threads coordinate via a non-blocking scalable approach. Our proposal is based on work deferring combined with dynamic re-binding of the calendar queue operations (insertions/extractions) to the best suited among the concurrent threads hosted by the underlying computing platform. This changes the locality of the operations by threads in a way positively reflected onto NUMA tasks at the hardware level. We report the results of an experimental study, demonstrating the capability of our solution to achieve the order of 15% better performance compared to state-of-the-art solutions already suited for multicore environments.

查看原文本刊更多论文

NUMA-Aware非阻塞日历队列

现代计算平台是基于多处理器/多核技术的。这允许运行具有高度硬件并行性的应用程序。然而，中高端机器在访问共享数据时遇到了一个与非对称延迟相关的问题。具体来说，非均匀内存访问(NUMA)是主流技术，这要归功于其扩展内存带宽的能力，然而，它在cpu内核和内存库之间施加了不对称的距离，使得线程访问放置在远NUMA节点上的数据严重影响性能。在本文中，我们在共享事件池管理的背景下解决这个问题，共享事件池管理是许多领域的一个相关方面，比如并行离散事件模拟。具体来说，我们提出了一个numa感知的日历队列，它还具有通过非阻塞可扩展方法使并发线程协调的优点。我们的建议是基于工作延迟和日历队列操作(插入/提取)的动态重新绑定，以最适合底层计算平台托管的并发线程。这改变了线程操作的局部性，在某种程度上积极地反映在硬件级别的NUMA任务上。我们报告了一项实验研究的结果，证明了我们的解决方案能够比已经适合多核环境的最先进的解决方案提高15%的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE/ACM 24th International Symposium on Distributed Simulation and Real Time Applications (DS-RT)

自引率

0.00%

发文量