ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste
{"title":"利用 OpenMP 卸载和 Codee 优化天气研究和预测模型","authors":"ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste","doi":"arxiv-2409.07232","DOIUrl":null,"url":null,"abstract":"Currently, the Weather Research and Forecasting model (WRF) utilizes shared\nmemory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of\nGPU resources on the Perlmutter supercomputer at NERSC, we port parts of the\ncomputationally expensive routines of the Fast Spectral Bin Microphysics (FSBM)\nmicrophysical scheme to NVIDIA GPUs using OpenMP device offloading directives.\nTo facilitate this process, we explore a workflow for optimization which uses\nboth runtime profilers and a static code inspection tool Codee to refactor the\nsubroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm\ntest case.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"63 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee\",\"authors\":\"ChayanonNamo, WichitrnithedHelen, Woo-Sun-YangHelen, YunHelen, He, Brad Richardson, Koichi Sakaguchi, Manuel Arenaz, William I. Gustafson Jr., Jacob Shpund, Ulises Costi Blanco, Alvaro Goldar Dieste\",\"doi\":\"arxiv-2409.07232\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Currently, the Weather Research and Forecasting model (WRF) utilizes shared\\nmemory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of\\nGPU resources on the Perlmutter supercomputer at NERSC, we port parts of the\\ncomputationally expensive routines of the Fast Spectral Bin Microphysics (FSBM)\\nmicrophysical scheme to NVIDIA GPUs using OpenMP device offloading directives.\\nTo facilitate this process, we explore a workflow for optimization which uses\\nboth runtime profilers and a static code inspection tool Codee to refactor the\\nsubroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm\\ntest case.\",\"PeriodicalId\":501422,\"journal\":{\"name\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"volume\":\"63 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Distributed, Parallel, and Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07232\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Distributed, Parallel, and Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Optimizing the Weather Research and Forecasting Model with OpenMP Offload and Codee
Currently, the Weather Research and Forecasting model (WRF) utilizes shared
memory (OpenMP) and distributed memory (MPI) parallelisms. To take advantage of
GPU resources on the Perlmutter supercomputer at NERSC, we port parts of the
computationally expensive routines of the Fast Spectral Bin Microphysics (FSBM)
microphysical scheme to NVIDIA GPUs using OpenMP device offloading directives.
To facilitate this process, we explore a workflow for optimization which uses
both runtime profilers and a static code inspection tool Codee to refactor the
subroutine. We observe a 2.08x overall speedup for the CONUS-12km thunderstorm
test case.