Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis最新文献

The Simple Cloud-Resolving E3SM Atmosphere Model Running on the Frontier Exascale System 在前沿超大规模系统上运行的简单云解析 E3SM 大气模型

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-12 DOI: 10.1145/3581784.3627044

Mark Taylor, Peter M. Caldwell, Luca Bertagna, Conrad Clevenger, Aaron Donahue, J. Foucar, O. Guba, Benjamin Hillman, Noel Keen, Jayesh Krishna, Matthew Norman, S. Sreepathi, Christopher Terai, James B. White, A. Salinger, Renata B McCoy, L. R. Leung, David C. Bader, Danqing Wu

引用次数: 0

Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems 在 Cerebras CS-2 系统上利用代数压缩技术扩展多维地震处理的 "记忆墙

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3627042

H. Ltaief, Yuxi Hong, Leighton Wilson, Mathias Jacquelin, Matteo Ravasi, David E. Keyes

{"title":"Scaling the “Memory Wall” for Multi-Dimensional Seismic Processing with Algebraic Compression on Cerebras CS-2 Systems","authors":"H. Ltaief, Yuxi Hong, Leighton Wilson, Mathias Jacquelin, Matteo Ravasi, David E. Keyes","doi":"10.1145/3581784.3627042","DOIUrl":"https://doi.org/10.1145/3581784.3627042","url":null,"abstract":"We exploit the high memory bandwidth of AI-customized Cerebras CS-2 systems for seismic processing. By leveraging low-rank matrix approximation, we fit memory-hungry seismic applications onto memory-austere SRAM wafer-scale hardware, thus addressing a challenge arising in many wave-equation-based algorithms that rely on Multi-Dimensional Convolution (MDC) operators. Exploiting sparsity inherent in seismic data in the frequency domain, we implement embarrassingly parallel tile low-rank matrix-vector multiplications (TLR-MVM), which account for most of the elapsed time in MDC operations, to successfully solve the Multi-Dimensional Deconvolution (MDD) inverse problem. By reducing memory footprint along with arithmetic complexity, we fit a standard seismic benchmark dataset into the small local memories of Cerebras processing elements. Deploying TLR-MVM execution onto 48 CS-2 systems in support of MDD gives a sustained memory bandwidth of 92.58PB/s on 35, 784, 000 processing elements, a significant milestone that highlights the capabilities of AI-customized architectures to enable a new generation of seismic algorithms that will empower multiple technologies of our low-carbon future.","PeriodicalId":124077,"journal":{"name":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"51 2","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139280278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Large-Scale Materials Modeling at Quantum Accuracy: Ab Initio Simulations of Quasicrystals and Interacting Extended Defects in Metallic Alloys 量子精度的大规模材料建模：金属合金中的准晶体和相互作用扩展缺陷的 Ab Initio 仿真

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3627037

Sambit Das, Bikash Kanungo, Vishal Subramanian, Gourab Panigrahi, P. Motamarri, David M. Rogers, Paul M. Zimmerman, V. Gavini

{"title":"Large-Scale Materials Modeling at Quantum Accuracy: Ab Initio Simulations of Quasicrystals and Interacting Extended Defects in Metallic Alloys","authors":"Sambit Das, Bikash Kanungo, Vishal Subramanian, Gourab Panigrahi, P. Motamarri, David M. Rogers, Paul M. Zimmerman, V. Gavini","doi":"10.1145/3581784.3627037","DOIUrl":"https://doi.org/10.1145/3581784.3627037","url":null,"abstract":"Ab initio electronic-structure has remained dichotomous between achievable accuracy and length-scale. Quantum many-body (QMB) methods realize quantum accuracy but fail to scale. Density functional theory (DFT) scales favorably but remains far from quantum accuracy. We present a framework that breaks this dichotomy by use of three interconnected modules: (i) invDFT: a methodological advance in inverse DFT linking QMB methods to DFT; (ii) MLXC: a machine-learned density functional trained with invDFT data, commensurate with quantum accuracy; (iii) DFT-FE-MLXC: an adaptive higher-order spectral finite-element (FE) based DFT implementation that integrates MLXC with efficient solver strategies and HPC innovations in FE-specific dense linear algebra, mixed-precision algorithms, and asynchronous compute-communication. We demonstrate a paradigm shift in DFT that not only provides an accuracy commensurate with QMB methods in ground-state energies, but also attains an unprecedented performance of 659.7 PFLOPS (43.1% peak FP64 performance) on 619,124 electrons using 8,000 GPU nodes of Frontier supercomputer.","PeriodicalId":124077,"journal":{"name":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139279821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

I/O in WRF: A Case Study in Modern Parallel I/O Techniques WRF 中的 I/O：现代并行 I/O 技术案例研究

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3613216

Zanhua Huang, Kai-yuan Hou, Ankit Agrawal, Alok N. Choudhary, Robert Ross, W. Liao

引用次数: 0

Establishing a Modeling System in 3-km Horizontal Resolution for Global Atmospheric Circulation triggered by Submarine Volcanic Eruptions with 400 Billion Smoothed Particle Hydrodynamics 利用 4000 亿平滑粒子流体力学建立 3 千米水平分辨率的海底火山爆发引发的全球大气环流建模系统

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3627045

Shenghong Huang, Junshi Chen, Ziyu Zhang, Xiaoyu Hao, Jun Gu, Hong An, Chun Zhao, Yan Hu, Zhanming Wang, Longkui Chen, Yifan Luo, Jineng Yao, Yi Zhang, Yang Zhao, Zhihao Wang, Dongning Jia, Zhao Jin, Changming Song, Xisheng Luo, Xiaobin He, Dexun Chen

{"title":"Establishing a Modeling System in 3-km Horizontal Resolution for Global Atmospheric Circulation triggered by Submarine Volcanic Eruptions with 400 Billion Smoothed Particle Hydrodynamics","authors":"Shenghong Huang, Junshi Chen, Ziyu Zhang, Xiaoyu Hao, Jun Gu, Hong An, Chun Zhao, Yan Hu, Zhanming Wang, Longkui Chen, Yifan Luo, Jineng Yao, Yi Zhang, Yang Zhao, Zhihao Wang, Dongning Jia, Zhao Jin, Changming Song, Xisheng Luo, Xiaobin He, Dexun Chen","doi":"10.1145/3581784.3627045","DOIUrl":"https://doi.org/10.1145/3581784.3627045","url":null,"abstract":"People are increasingly concerned about how tectonic processes affect climate and vice versa. We establish a cross-sphere modeling system for volcanic eruptions and atmosphere circulation on a new Sunway supercomputer with a spatial resolution from 10m locally to 3km globally, using an improved multimedium and multiphase smoothed particle hydrodynamics (SPH) combined with a fully coupled meteorology-chemistry global atmospheric modeling scheme. We achieve 400 billion particles and 80% parallel efficiency using 39,000,000 processor cores. The simulation captures the whole dynamic process of the Tonga eruption from shock waves, earthquakes, tsunamis, mushroom clouds to the following 6--7 days of transport and diffusion of ash and water vapor, and preliminarily obtains the influence effect of full coupling of volcano, earthquake, ocean and atmosphere. This work is of great significance for deeply understanding the interaction between tectonic processes and climate change, and establishing an early warning simulation system for similar global hazard events.","PeriodicalId":124077,"journal":{"name":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"31 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139280164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FORGE: Pre-Training Open Foundation Models for Science FORGE：预培训开放式科学基础模型

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3613215

Junqi Yin, Sajal Dash, Feiyi Wang, M. Shankar

{"title":"FORGE: Pre-Training Open Foundation Models for Science","authors":"Junqi Yin, Sajal Dash, Feiyi Wang, M. Shankar","doi":"10.1145/3581784.3613215","DOIUrl":"https://doi.org/10.1145/3581784.3613215","url":null,"abstract":"Large language models (LLMs) are poised to revolutionize the way we conduct scientific research. However, both model complexity and pre-training cost are impeding effective adoption for the wider science community. Identifying suitable scientific use cases, finding the optimal balance between model and data sizes, and scaling up model training are among the most pressing issues that need to be addressed. In this study, we provide practical solutions for building and using LLM-based foundation models targeting scientific research use cases. We present an end-to-end examination of the effectiveness of LLMs in scientific research, including their scaling behavior and computational requirements on Frontier, the first Exascale supercomputer. We have also developed for release to the scientific community a suite of open foundation models called FORGE with up to 26B parameters using 257B tokens from over 200M scientific articles, with performance either on par or superior to other state-of-the-art comparable models. We have demonstrated the use and effectiveness of FORGE on scientific downstream tasks. Our research establishes best practices that can be applied across various fields to take advantage of LLMs for scientific discovery.","PeriodicalId":124077,"journal":{"name":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"65 3","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139280262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Big Data Assimilation: Real-time 30-second-refresh Heavy Rain Forecast Using Fugaku During Tokyo Olympics and Paralympics 大数据同化：东京奥运会和残奥会期间使用 Fugaku 进行 30 秒刷新的实时暴雨预报

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3627047

T. Miyoshi, A. Amemiya, S. Otsuka, Y. Maejima, James Taylor, T. Honda, Hirofumi Tomita, S. Nishizawa, Kenta Sueki, T. Yamaura, Yutaka Ishikawa, Shinsuke Satoh, T. Ushio, K. Koike, Atsuya Uno

{"title":"Big Data Assimilation: Real-time 30-second-refresh Heavy Rain Forecast Using Fugaku During Tokyo Olympics and Paralympics","authors":"T. Miyoshi, A. Amemiya, S. Otsuka, Y. Maejima, James Taylor, T. Honda, Hirofumi Tomita, S. Nishizawa, Kenta Sueki, T. Yamaura, Yutaka Ishikawa, Shinsuke Satoh, T. Ushio, K. Koike, Atsuya Uno","doi":"10.1145/3581784.3627047","DOIUrl":"https://doi.org/10.1145/3581784.3627047","url":null,"abstract":"Real-time 30-second-refresh numerical weather prediction (NWP) was performed with exclusive use of 11,580 nodes (~7%) of supercomputer Fugaku during Tokyo Olympics and Paralympics in 2021. Total 75,248 forecasts were disseminated in the 1-month period mostly stably with time-to-solution less than 3 minutes for 30-minute forecast. Japan's Big Data Assimilation (BDA) project developed the novel NWP system for precise prediction of hazardous rains toward solving the global climate crisis. Compared with typical 1-hour-refresh systems, the BDA system offered two orders of magnitude increase in problem size and revealed the effectiveness of 30-second refresh for highly nonlinear, rapidly evolving convective rains. To achieve the required time-to-solution for real-time 30-second refresh with high accuracy, the core BDA software incorporated single precision and enhanced parallel I/O with properly selected configurations of 1000 ensemble members and 500-m-mesh weather model. The massively parallel, I/O intensive real-time BDA computation demonstrated a promising future direction.","PeriodicalId":124077,"journal":{"name":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"14 4","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139279855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations 通过前所未有的谱元模拟探索湍流雷利-贝纳德对流的终极状态

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3627039

Niclas Jansson, Martin Karp, Adalberto Perez, T. Mukha, Yi Ju, Jiahui Liu, Szilárd Páll, Erwin Laure, T. Weinkauf, J. Schumacher, P. Schlatter, S. Markidis

引用次数: 0

Exascale Multiphysics Nuclear Reactor Simulations for Advanced Designs 用于先进设计的超大规模多物理场核反应堆模拟

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-11-11 DOI: 10.1145/3581784.3627038

Elia Merzari, Steven Hamilton, Thomas Evans, M. Min, Paul F. Fischer, S. Kerkemeier, Jun Fang, Paul Romano, Yu-Hsiang Lan, Malachi Phillips, E. Biondo, K. Royston, Tim Warburton, Noel Chalmers, T. Rathnayake

{"title":"Exascale Multiphysics Nuclear Reactor Simulations for Advanced Designs","authors":"Elia Merzari, Steven Hamilton, Thomas Evans, M. Min, Paul F. Fischer, S. Kerkemeier, Jun Fang, Paul Romano, Yu-Hsiang Lan, Malachi Phillips, E. Biondo, K. Royston, Tim Warburton, Noel Chalmers, T. Rathnayake","doi":"10.1145/3581784.3627038","DOIUrl":"https://doi.org/10.1145/3581784.3627038","url":null,"abstract":"ENRICO is a coupled application developed under the U.S. Department of Energy's Exascale Computing Project (ECP) targeting the modeling of advanced nuclear reactors. It couples radiation transport with heat and fluid simulation, including the high-fidelity, highresolution Monte-Carlo code Shift and the Computational fluid dynamics code NekRS. NekRS is a highly-performant open-source code for simulation of incompressible and low-Mach fluid flow, heat transfer, and combustion with a particular focus on turbulent flows in complex domains. It is based on rapidly convergent high-order spectral element discretizations that feature minimal numerical dissipation and dispersion. State-of-the-art multilevel preconditioners, efficient high-order time-splitting methods, and runtime-adaptive communication strategies are built on a fast OCCA-based kernel library, libParanumal, to provide scalability and portability across the spectrum of current and future high-performance computing platforms. On Frontier, Nek5000/RS has recently achieved an unprecedented milestone in breaching over 1 billion spectral elements and 350 billion degrees of freedom. Shift has demonstrated the capability to transport upwards of 1 billion particles per second in full core nuclear reactor simulations featuring complete temperature-dependent, continuous-energy physics on Frontier. Shift achieved a weak-scaling efficiency of 97.8% on 8192 nodes of Frontier and calculated 6 reactions in 214,896 fuel pin regions below 1% statistical error yielding first-of-a-kind resolution for a Monte Carlo transport application.","PeriodicalId":124077,"journal":{"name":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139279887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU HPAC-Offload:在GPU上使用便携式近似计算加速HPC应用

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2023-08-31 DOI: 10.48550/arXiv.2308.16877

Zane Fink, K. Parasyris, G. Georgakoudis, Harshitha Menon

{"title":"HPAC-Offload: Accelerating HPC Applications with Portable Approximate Computing on the GPU","authors":"Zane Fink, K. Parasyris, G. Georgakoudis, Harshitha Menon","doi":"10.48550/arXiv.2308.16877","DOIUrl":"https://doi.org/10.48550/arXiv.2308.16877","url":null,"abstract":"The end of Dennard scaling and the slowdown of Moore's law led to a shift in technology trends towards parallel architectures, particularly in HPC systems. To continue providing performance benefits, HPC should embrace Approximate Computing (AC), which trades application quality loss for improved performance. However, existing AC techniques have not been extensively applied and evaluated in state-of-the-art hardware architectures such as GPUs, the primary execution vehicle for HPC applications today. This paper presents HPAC-Offload, a pragma-based programming model that extends OpenMP offload applications to support AC techniques, allowing portable approximations across different GPU architectures. We conduct a comprehensive performance analysis of HPAC-Offload across GPU-accelerated HPC applications, revealing that AC techniques can significantly accelerate HPC applications (1.64x LULESH on AMD, 1.57x NVIDIA) with minimal quality loss (0.1%). Our analysis offers deep insights into the performance of GPU-based AC that guide the future development of AC algorithms and systems for these architectures.","PeriodicalId":124077,"journal":{"name":"Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132482337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0