{"title":"Microarchitectural comparison and in-core modeling of state-of-the-art CPUs: Grace, Sapphire Rapids, and Genoa","authors":"Jan Laukemann, Georg Hager, Gerhard Wellein","doi":"arxiv-2409.08108","DOIUrl":null,"url":null,"abstract":"With Nvidia's release of the Grace Superchip, all three big semiconductor\ncompanies in HPC (AMD, Intel, Nvidia) are currently competing in the race for\nthe best CPU. In this work we analyze the performance of these state-of-the-art\nCPUs and create an accurate in-core performance model for their\nmicroarchitectures Zen 4, Golden Cove, and Neoverse V2, extending the Open\nSource Architecture Code Analyzer (OSACA) tool and comparing it with LLVM-MCA.\nStarting from the peculiarities and up- and downsides of a single core, we\nextend our comparison by a variety of microbenchmarks and the capabilities of a\nfull node. The \"write-allocate (WA) evasion\" feature, which can automatically\nreduce the memory traffic caused by write misses, receives special attention;\nwe show that the Grace Superchip has a next-to-optimal implementation of WA\nevasion, and that the only way to avoid write allocates on Zen 4 is the\nexplicit use of non-temporal stores.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.08108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With Nvidia's release of the Grace Superchip, all three big semiconductor
companies in HPC (AMD, Intel, Nvidia) are currently competing in the race for
the best CPU. In this work we analyze the performance of these state-of-the-art
CPUs and create an accurate in-core performance model for their
microarchitectures Zen 4, Golden Cove, and Neoverse V2, extending the Open
Source Architecture Code Analyzer (OSACA) tool and comparing it with LLVM-MCA.
Starting from the peculiarities and up- and downsides of a single core, we
extend our comparison by a variety of microbenchmarks and the capabilities of a
full node. The "write-allocate (WA) evasion" feature, which can automatically
reduce the memory traffic caused by write misses, receives special attention;
we show that the Grace Superchip has a next-to-optimal implementation of WA
evasion, and that the only way to avoid write allocates on Zen 4 is the
explicit use of non-temporal stores.