Fast Behavioural RTL Simulation of 10B Transistor SoC Designs with Metro-Mpi

Guillem López-Paradís, Brian Li, Adrià Armejach, Stefan Wallentowitz, Miquel Moretó, Jonathan Balkind
{"title":"Fast Behavioural RTL Simulation of 10B Transistor SoC Designs with Metro-Mpi","authors":"Guillem López-Paradís, Brian Li, Adrià Armejach, Stefan Wallentowitz, Miquel Moretó, Jonathan Balkind","doi":"10.23919/DATE56975.2023.10137080","DOIUrl":null,"url":null,"abstract":"Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for the designer's benefit. However, tools largely remain restricted to a single machine and in the case of RTL simulation, we believe that this leaves much potential performance on the table. We introduce Metro-MPI to improve RTL simulation for modern 10 billion transistor-scale chips. Metro-MPI exploits the natural boundaries present in chip designs to partition RTL simulations and leverage High Performance Computing (HPC) techniques to extract parallelism. For chip designs that scale in size by exploiting latency-insensitive interfaces like networks-on-chip and AXI, Metro-MPI offers a new paradigm for RTL simulation scalability. Our implementation of Metro-MPI in Open-Piton+Ariane delivers 2.7 MIPS of RTL simulation throughput for the first time on a design with more than 10 billion transistors and 1,024 Linux-capable cores, opening new avenues for distributed RTL simulation of emerging system-on-chip designs. Compared to sequential and multithreaded RTL simulations of smaller designs, Metro-MPI achieves up to $135.98\\times$ and $9.29\\times$ speedups. Similarly, for a representative regression run, Metro-Mpireduces energy consumption by up to $2.53\\times$ and $2.91\\times$.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/DATE56975.2023.10137080","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Chips with tens of billions of transistors have become today's norm. These designs are straining our electronic design automation tools throughout the design process, requiring ever more computational resources. In many tools, parallelisation has improved both latency and throughput for the designer's benefit. However, tools largely remain restricted to a single machine and in the case of RTL simulation, we believe that this leaves much potential performance on the table. We introduce Metro-MPI to improve RTL simulation for modern 10 billion transistor-scale chips. Metro-MPI exploits the natural boundaries present in chip designs to partition RTL simulations and leverage High Performance Computing (HPC) techniques to extract parallelism. For chip designs that scale in size by exploiting latency-insensitive interfaces like networks-on-chip and AXI, Metro-MPI offers a new paradigm for RTL simulation scalability. Our implementation of Metro-MPI in Open-Piton+Ariane delivers 2.7 MIPS of RTL simulation throughput for the first time on a design with more than 10 billion transistors and 1,024 Linux-capable cores, opening new avenues for distributed RTL simulation of emerging system-on-chip designs. Compared to sequential and multithreaded RTL simulations of smaller designs, Metro-MPI achieves up to $135.98\times$ and $9.29\times$ speedups. Similarly, for a representative regression run, Metro-Mpireduces energy consumption by up to $2.53\times$ and $2.91\times$.
基于Metro-Mpi的10B晶体管SoC设计的快速行为RTL仿真
拥有数百亿晶体管的芯片已成为当今的标准。这些设计在整个设计过程中使我们的电子设计自动化工具变得紧张,需要更多的计算资源。在许多工具中,并行化改善了延迟和吞吐量,从而使设计人员受益。然而,工具在很大程度上仍然限制在一台机器上,在RTL模拟的情况下,我们认为这留下了很多潜在的性能。我们引入了Metro-MPI来改进现代100亿晶体管规模芯片的RTL模拟。Metro-MPI利用芯片设计中的自然边界来划分RTL模拟,并利用高性能计算(HPC)技术来提取并行性。对于通过利用对延迟不敏感的接口(如片上网络和AXI)来扩展尺寸的芯片设计,Metro-MPI为RTL仿真可扩展性提供了一种新的范例。我们在Open-Piton+Ariane中实现的Metro-MPI首次在超过100亿个晶体管和1024个linux内核的设计上提供了2.7 MIPS的RTL模拟吞吐量,为新兴的片上系统设计的分布式RTL模拟开辟了新的途径。与较小设计的顺序和多线程RTL模拟相比,Metro-MPI实现了高达135.98倍和9.29倍的速度提升。同样,对于代表性的回归运行,metro - mpi减少能耗高达2.53美元和2.91美元。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信