Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing Pub Date : 2011-05-23 DOI:10.1109/CCGrid.2011.39

Simone Secchi, Antonino Tumeo, Oreste Villa

{"title":"Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT","authors":"Simone Secchi, Antonino Tumeo, Oreste Villa","doi":"10.1109/CCGrid.2011.39","DOIUrl":null,"url":null,"abstract":"Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved towards the exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoiding network hot-spots and improving scalability. Parallel simulation is a promising approach, which has been extensively used to model the performance of such large-scale machines. One of the most critical factors in coping with the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class. In this paper, we discuss the development of a network contention model for a full-system XMT simulator. We start by measuring the effects of network contention on a 128-processorXMT machine, we then investigate the trade-off that exists between simulation accuracy and speed, comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the actual machine, using three datasets that generate noticeably different contention patterns. Results prove that simulator accuracy in execution time remains within 10% of the real machine. We also show that the slowdown due to contention modeling is limited to 20%, when simulating a small number of processors, and becomes negligible for simulations with higher processor counts.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"175 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2011.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved towards the exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoiding network hot-spots and improving scalability. Parallel simulation is a promising approach, which has been extensively used to model the performance of such large-scale machines. One of the most critical factors in coping with the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class. In this paper, we discuss the development of a network contention model for a full-system XMT simulator. We start by measuring the effects of network contention on a 128-processorXMT machine, we then investigate the trade-off that exists between simulation accuracy and speed, comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the actual machine, using three datasets that generate noticeably different contention patterns. Results prove that simulator accuracy in execution time remains within 10% of the real machine. We also show that the slowdown due to contention modeling is limited to 20%, when simulating a small number of processors, and becomes negligible for simulations with higher processor counts.

查看原文本刊更多论文

多线程分布式共享内存机的争用建模:Cray XMT

分布式共享内存(DSM)机器是一类广泛的多处理器计算系统，其中一个大型虚拟共享地址空间被映射到物理分布式内存网络上。高内存延迟和网络争用是限制这类体系结构性能扩展的两个主要因素。现代高性能计算DSM系统已经朝着利用大量硬件多线程和细粒度内存哈希来容忍不规则延迟、避免网络热点和提高可伸缩性的方向发展。并行仿真是一种很有前途的方法，已被广泛用于模拟此类大型机器的性能。处理仿真速度与精度权衡的最关键因素之一是网络建模。Cray XMT是一个大型多线程超级计算体系结构，属于DSM类。在本文中，我们讨论了一个全系统XMT模拟器的网络争用模型的开发。我们首先在一台128处理器的xmt机器上测量网络争用的影响，然后研究仿真精度和速度之间存在的权衡，比较在不同精度水平下运行的三种网络模型。通过在全系统模拟器和实际机器上执行字符串匹配算法来执行比较和模型验证，使用三个生成明显不同争用模式的数据集。结果表明，仿真器在执行时间上的精度保持在真实机的10%以内。我们还表明，当模拟少量处理器时，由于争用建模而导致的减速限制在20%，并且对于具有较高处理器数量的模拟来说可以忽略不计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

自引率

0.00%

发文量