Investigation of leading HPC I/O performance using a scientific-application derived benchmark

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI:10.1145/1362622.1362636

J. Borrill, L. Oliker, J. Shalf, H. Shan

{"title":"Investigation of leading HPC I/O performance using a scientific-application derived benchmark","authors":"J. Borrill, L. Oliker, J. Shalf, H. Shan","doi":"10.1145/1362622.1362636","DOIUrl":null,"url":null,"abstract":"With the exponential growth of high-fidelity sensor and simulated data, the scientific community is increasingly reliant on ultrascale HPC resources to handle their data analysis requirements. However, to utilize such extreme computing power effectively, the I/O components must be designed in a balanced fashion, as any architectural bottleneck will quickly render the platform intolerably inefficient. To understand I/O performance of data-intensive applications in realistic computational settings, we develop a lightweight, portable benchmark called MADbench2, which is derived directly from a large-scale Cosmic Microwave Background (CMB) data analysis package. Our study represents one of the most comprehensive I/O analyses of modern parallel filesystems, examining a broad range of system architectures and configurations, including Lustre on the Cray XT3 and Intel Itanium2 cluster; GPFS on IBM Power5 and AMD Opteron platforms; two BlueGene/L installations utilizing GPFS and PVFS2 filesystems; and CXFS on the SGI Altix3700. We present extensive synchronous I/O performance data comparing a number of key parameters including concurrency, POSIX- versus MPI-IO, and unique- versus shared-file accesses, using both the default environment as well as highly-tuned I/O parameters. Finally, we explore the potential of asynchronous I/O and quantify the volume of computation required to hide a given volume of I/O. Overall our study quantifies the vast differences in performance and functionality of parallel filesystems across state-of-the-art platforms, while providing system designers and computational scientists a lightweight tool for conducting further analyses.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1362622.1362636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 67

Abstract

With the exponential growth of high-fidelity sensor and simulated data, the scientific community is increasingly reliant on ultrascale HPC resources to handle their data analysis requirements. However, to utilize such extreme computing power effectively, the I/O components must be designed in a balanced fashion, as any architectural bottleneck will quickly render the platform intolerably inefficient. To understand I/O performance of data-intensive applications in realistic computational settings, we develop a lightweight, portable benchmark called MADbench2, which is derived directly from a large-scale Cosmic Microwave Background (CMB) data analysis package. Our study represents one of the most comprehensive I/O analyses of modern parallel filesystems, examining a broad range of system architectures and configurations, including Lustre on the Cray XT3 and Intel Itanium2 cluster; GPFS on IBM Power5 and AMD Opteron platforms; two BlueGene/L installations utilizing GPFS and PVFS2 filesystems; and CXFS on the SGI Altix3700. We present extensive synchronous I/O performance data comparing a number of key parameters including concurrency, POSIX- versus MPI-IO, and unique- versus shared-file accesses, using both the default environment as well as highly-tuned I/O parameters. Finally, we explore the potential of asynchronous I/O and quantify the volume of computation required to hide a given volume of I/O. Overall our study quantifies the vast differences in performance and functionality of parallel filesystems across state-of-the-art platforms, while providing system designers and computational scientists a lightweight tool for conducting further analyses.

查看原文本刊更多论文

使用科学应用程序衍生基准调查领先的HPC I/O性能

随着高保真传感器和模拟数据的指数级增长，科学界越来越依赖于超大规模的高性能计算资源来处理他们的数据分析需求。然而，为了有效地利用这种极端的计算能力，必须以一种平衡的方式设计I/O组件，因为任何架构瓶颈都会迅速导致平台无法忍受的低效率。为了理解数据密集型应用程序在实际计算环境中的I/O性能，我们开发了一个轻量级的便携式基准，称为MADbench2，它直接来自大规模宇宙微波背景(CMB)数据分析包。我们的研究是对现代并行文件系统最全面的I/O分析之一，研究了广泛的系统架构和配置，包括在Cray XT3和Intel Itanium2集群上的Lustre;GPFS在IBM Power5和AMD Opteron平台;两个使用GPFS和PVFS2文件系统的BlueGene/L安装;和SGI Altix3700上的CXFS。我们提供了广泛的同步I/O性能数据，比较了许多关键参数，包括并发性、POSIX与MPI-IO、唯一访问与共享文件访问，使用默认环境和高度调优的I/O参数。最后，我们将探讨异步I/O的潜力，并量化隐藏给定I/O量所需的计算量。总的来说，我们的研究量化了跨最先进平台的并行文件系统在性能和功能上的巨大差异，同时为系统设计师和计算科学家提供了进行进一步分析的轻量级工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)

自引率

0.00%

发文量