Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators

Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun
{"title":"Boosting Earth System Model Outputs And Saving PetaBytes in their Storage Using Exascale Climate Emulators","authors":"Sameh Abdulah, Allison H. Baker, George Bosilca, Qinglei Cao, Stefano Castruccio, Marc G. Genton, David E. Keyes, Zubair Khalid, Hatem Ltaief, Yan Song, Georgiy L. Stenchikov, Ying Sun","doi":"arxiv-2408.04440","DOIUrl":null,"url":null,"abstract":"We present the design and scalable implementation of an exascale climate\nemulator for addressing the escalating computational and storage requirements\nof high-resolution Earth System Model simulations. We utilize the spherical\nharmonic transform to stochastically model spatio-temporal variations in\nclimate data. This provides tunable spatio-temporal resolution and\nsignificantly improves the fidelity and granularity of climate emulation,\nachieving an ultra-high spatial resolution of 0.034 (approximately 3.5 km) in\nspace. Our emulator, trained on 318 billion hourly temperature data points from\na 35-year and 31 billion daily data points from an 83-year global simulation\nensemble, generates statistically consistent climate emulations. We extend\nlinear solver software to mixed-precision arithmetic GPUs, applying different\nprecisions within a single solver to adapt to different correlation strengths.\nThe PaRSEC runtime system supports efficient parallel matrix operations by\noptimizing the dynamic balance between computation, communication, and memory\nrequirements. Our BLAS3-rich code is optimized for systems equipped with four\ndifferent families and generations of GPUs, scaling well to achieve 0.976\nEFlop/s on 9,025 nodes (36,100 AMD MI250X multichip module (MCM) GPUs) of\nFrontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-Hopper\nSuperchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) of\nLeonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.04440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present the design and scalable implementation of an exascale climate emulator for addressing the escalating computational and storage requirements of high-resolution Earth System Model simulations. We utilize the spherical harmonic transform to stochastically model spatio-temporal variations in climate data. This provides tunable spatio-temporal resolution and significantly improves the fidelity and granularity of climate emulation, achieving an ultra-high spatial resolution of 0.034 (approximately 3.5 km) in space. Our emulator, trained on 318 billion hourly temperature data points from a 35-year and 31 billion daily data points from an 83-year global simulation ensemble, generates statistically consistent climate emulations. We extend linear solver software to mixed-precision arithmetic GPUs, applying different precisions within a single solver to adapt to different correlation strengths. The PaRSEC runtime system supports efficient parallel matrix operations by optimizing the dynamic balance between computation, communication, and memory requirements. Our BLAS3-rich code is optimized for systems equipped with four different families and generations of GPUs, scaling well to achieve 0.976 EFlop/s on 9,025 nodes (36,100 AMD MI250X multichip module (MCM) GPUs) of Frontier (nearly full system), 0.739 EFlop/s on 1,936 nodes (7,744 Grace-Hopper Superchips (GH200)) of Alps, 0.243 EFlop/s on 1,024 nodes (4,096 A100 GPUs) of Leonardo, and 0.375 EFlop/s on 3,072 nodes (18,432 V100 GPUs) of Summit.
利用超大规模气候模拟器提升地球系统模型输出并节省 PetaBytes 的存储空间
我们介绍了超大规模气候模拟器的设计和可扩展实施,以满足高分辨率地球系统模型模拟不断增长的计算和存储需求。我们利用球形谐波变换来随机模拟气候数据的时空变化。这提供了可调的时空分辨率,显著提高了气候模拟的保真度和粒度,实现了 0.034(约 3.5 公里)的超高空间分辨率。我们的模拟器根据 35 年的 3180 亿个小时温度数据点和 83 年全球模拟集合的 310 亿个日数据点进行训练,生成了统计上一致的气候模拟。我们将线性求解器软件扩展到混合精度算术 GPU,在单个求解器中应用不同精度,以适应不同的相关强度。PaRSEC 运行时系统通过优化计算、通信和内存需求之间的动态平衡,支持高效的并行矩阵操作。我们富含BLAS3的代码针对配备四种不同系列和世代GPU的系统进行了优化,在Frontier(几乎全系统)的9,025个节点(36,100个AMD MI250X多芯片模块(MCM)GPU)上实现了0.976EFlop/s的速度,在1,025个节点(36,100个AMD MI250X多芯片模块(MCM)GPU)上实现了0.739EFlop/s的速度。在 Alps 的 1,936 个节点(7,744 个 Grace-HopperSuperchips (GH200))上达到 0.739 EFlop/s,在 Leonardo 的 1,024 个节点(4,096 个 A100 GPU)上达到 0.243 EFlop/s,在 Summit 的 3,072 个节点(18,432 个 V100 GPU)上达到 0.375 EFlop/s。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信