Investigation of leading HPC I/O performance using a scientific-application derived benchmark

J. Borrill, L. Oliker, J. Shalf, H. Shan
{"title":"Investigation of leading HPC I/O performance using a scientific-application derived benchmark","authors":"J. Borrill, L. Oliker, J. Shalf, H. Shan","doi":"10.1145/1362622.1362636","DOIUrl":null,"url":null,"abstract":"With the exponential growth of high-fidelity sensor and simulated data, the scientific community is increasingly reliant on ultrascale HPC resources to handle their data analysis requirements. However, to utilize such extreme computing power effectively, the I/O components must be designed in a balanced fashion, as any architectural bottleneck will quickly render the platform intolerably inefficient. To understand I/O performance of data-intensive applications in realistic computational settings, we develop a lightweight, portable benchmark called MADbench2, which is derived directly from a large-scale Cosmic Microwave Background (CMB) data analysis package. Our study represents one of the most comprehensive I/O analyses of modern parallel filesystems, examining a broad range of system architectures and configurations, including Lustre on the Cray XT3 and Intel Itanium2 cluster; GPFS on IBM Power5 and AMD Opteron platforms; two BlueGene/L installations utilizing GPFS and PVFS2 filesystems; and CXFS on the SGI Altix3700. We present extensive synchronous I/O performance data comparing a number of key parameters including concurrency, POSIX- versus MPI-IO, and unique- versus shared-file accesses, using both the default environment as well as highly-tuned I/O parameters. Finally, we explore the potential of asynchronous I/O and quantify the volume of computation required to hide a given volume of I/O. Overall our study quantifies the vast differences in performance and functionality of parallel filesystems across state-of-the-art platforms, while providing system designers and computational scientists a lightweight tool for conducting further analyses.","PeriodicalId":274744,"journal":{"name":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1362622.1362636","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67

Abstract

With the exponential growth of high-fidelity sensor and simulated data, the scientific community is increasingly reliant on ultrascale HPC resources to handle their data analysis requirements. However, to utilize such extreme computing power effectively, the I/O components must be designed in a balanced fashion, as any architectural bottleneck will quickly render the platform intolerably inefficient. To understand I/O performance of data-intensive applications in realistic computational settings, we develop a lightweight, portable benchmark called MADbench2, which is derived directly from a large-scale Cosmic Microwave Background (CMB) data analysis package. Our study represents one of the most comprehensive I/O analyses of modern parallel filesystems, examining a broad range of system architectures and configurations, including Lustre on the Cray XT3 and Intel Itanium2 cluster; GPFS on IBM Power5 and AMD Opteron platforms; two BlueGene/L installations utilizing GPFS and PVFS2 filesystems; and CXFS on the SGI Altix3700. We present extensive synchronous I/O performance data comparing a number of key parameters including concurrency, POSIX- versus MPI-IO, and unique- versus shared-file accesses, using both the default environment as well as highly-tuned I/O parameters. Finally, we explore the potential of asynchronous I/O and quantify the volume of computation required to hide a given volume of I/O. Overall our study quantifies the vast differences in performance and functionality of parallel filesystems across state-of-the-art platforms, while providing system designers and computational scientists a lightweight tool for conducting further analyses.
使用科学应用程序衍生基准调查领先的HPC I/O性能
随着高保真传感器和模拟数据的指数级增长,科学界越来越依赖于超大规模的高性能计算资源来处理他们的数据分析需求。然而,为了有效地利用这种极端的计算能力,必须以一种平衡的方式设计I/O组件,因为任何架构瓶颈都会迅速导致平台无法忍受的低效率。为了理解数据密集型应用程序在实际计算环境中的I/O性能,我们开发了一个轻量级的便携式基准,称为MADbench2,它直接来自大规模宇宙微波背景(CMB)数据分析包。我们的研究是对现代并行文件系统最全面的I/O分析之一,研究了广泛的系统架构和配置,包括在Cray XT3和Intel Itanium2集群上的Lustre;GPFS在IBM Power5和AMD Opteron平台;两个使用GPFS和PVFS2文件系统的BlueGene/L安装;和SGI Altix3700上的CXFS。我们提供了广泛的同步I/O性能数据,比较了许多关键参数,包括并发性、POSIX与MPI-IO、唯一访问与共享文件访问,使用默认环境和高度调优的I/O参数。最后,我们将探讨异步I/O的潜力,并量化隐藏给定I/O量所需的计算量。总的来说,我们的研究量化了跨最先进平台的并行文件系统在性能和功能上的巨大差异,同时为系统设计师和计算科学家提供了进行进一步分析的轻量级工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信