A case study in using massively parallel simulation for extreme-scale torus network codesign

M. Mubarak, C. Carothers, R. Ross, P. Carns
{"title":"A case study in using massively parallel simulation for extreme-scale torus network codesign","authors":"M. Mubarak, C. Carothers, R. Ross, P. Carns","doi":"10.1145/2601381.2601383","DOIUrl":null,"url":null,"abstract":"A high-bandwidth, low-latency interconnect will be a critical component of future exascale systems. The torus network topology, which uses multidimensional network links to improve path diversity and exploit locality between nodes, is a potential candidate for exascale interconnects.\n The communication behavior of large-scale scientific applications running on future exascale networks is particularly important and analytical/algorithmic models alone cannot deduce it. Therefore, before building systems, it is important to explore the design space and performance of candidate exascale interconnects by using simulation. We improve upon previous work in this area and present a methodology for modeling and simulating a high-fidelity, validated, and scalable torus network topology at a packet-chunk level detail using the Rensselaer Optimistic Simulation System (ROSS). We execute various configurations of a 1.3 million node torus network model in order to examine the effect of torus dimensionality on network performance with relevant HPC traffic patterns. To the best of our knowledge, these are the largest torus network simulations that are carried out at such a detailed fidelity. In terms of simulation performance, a 1.3 million node, 9-D torus network model is shown to process a simulated exascale-class workload of nearest-neighbor traffic with 100 million message injections per second per node using 65,536 Blue Gene/Q cores in a simulation run-time of only 25 seconds. We also demonstrate that massive-scale simulations are a critical tool in exascale system design since small-scale torus simulations are not always indicative of the network behavior at an exascale size. The take-away message from this case study is that massively parallel simulation is a key enabler for effective extreme-scale network codesign.","PeriodicalId":255272,"journal":{"name":"SIGSIM Principles of Advanced Discrete Simulation","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGSIM Principles of Advanced Discrete Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2601381.2601383","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 29

Abstract

A high-bandwidth, low-latency interconnect will be a critical component of future exascale systems. The torus network topology, which uses multidimensional network links to improve path diversity and exploit locality between nodes, is a potential candidate for exascale interconnects. The communication behavior of large-scale scientific applications running on future exascale networks is particularly important and analytical/algorithmic models alone cannot deduce it. Therefore, before building systems, it is important to explore the design space and performance of candidate exascale interconnects by using simulation. We improve upon previous work in this area and present a methodology for modeling and simulating a high-fidelity, validated, and scalable torus network topology at a packet-chunk level detail using the Rensselaer Optimistic Simulation System (ROSS). We execute various configurations of a 1.3 million node torus network model in order to examine the effect of torus dimensionality on network performance with relevant HPC traffic patterns. To the best of our knowledge, these are the largest torus network simulations that are carried out at such a detailed fidelity. In terms of simulation performance, a 1.3 million node, 9-D torus network model is shown to process a simulated exascale-class workload of nearest-neighbor traffic with 100 million message injections per second per node using 65,536 Blue Gene/Q cores in a simulation run-time of only 25 seconds. We also demonstrate that massive-scale simulations are a critical tool in exascale system design since small-scale torus simulations are not always indicative of the network behavior at an exascale size. The take-away message from this case study is that massively parallel simulation is a key enabler for effective extreme-scale network codesign.
大规模并行仿真在极端尺度环面网络协同设计中的应用
高带宽、低延迟的互连将是未来百亿亿级系统的关键组成部分。环面网络拓扑使用多维网络链路来改善路径多样性和利用节点之间的局部性,是百亿亿级互连的潜在候选者。运行在未来百亿亿级网络上的大规模科学应用程序的通信行为尤为重要,仅靠分析/算法模型无法推断出它。因此,在构建系统之前,重要的是通过仿真来探索候选百亿亿级互连的设计空间和性能。我们改进了之前在这一领域的工作,并提出了一种方法,用于使用Rensselaer乐观仿真系统(ROSS)在分组块级详细建模和模拟高保真、经过验证和可扩展的环面网络拓扑。我们执行了130万个节点环面网络模型的各种配置,以检查环面维度对相关HPC流量模式下网络性能的影响。据我们所知,这些是以如此详细的保真度进行的最大的环面网络模拟。在模拟性能方面,使用65,536个Blue Gene/Q内核,一个130万个节点、9-D环面网络模型在模拟运行时间仅为25秒的情况下,可以处理模拟的百亿亿级最近邻流量工作负载,每个节点每秒注入1亿个消息。我们还证明了大规模模拟是百亿亿级系统设计中的关键工具,因为小规模环面模拟并不总是表明百亿亿级规模的网络行为。从这个案例研究中得到的信息是,大规模并行仿真是有效的极端规模网络协同设计的关键推动者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信