SCORPIO:一个可扩展的两相并行I/O库,可应用于大规模地下模拟器

S. Sreepathi, Vamsi Sripathiy, R. Mills, Glenn Hammondz, G. Mahinthakumar
{"title":"SCORPIO:一个可扩展的两相并行I/O库,可应用于大规模地下模拟器","authors":"S. Sreepathi, Vamsi Sripathiy, R. Mills, Glenn Hammondz, G. Mahinthakumar","doi":"10.1145/2148600.2148635","DOIUrl":null,"url":null,"abstract":"Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library called Scorpio that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"SCORPIO: A scalable two-phase parallel I/O library with application to a large scale subsurface simulator\",\"authors\":\"S. Sreepathi, Vamsi Sripathiy, R. Mills, Glenn Hammondz, G. Mahinthakumar\",\"doi\":\"10.1145/2148600.2148635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library called Scorpio that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.\",\"PeriodicalId\":206307,\"journal\":{\"name\":\"20th Annual International Conference on High Performance Computing\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"20th Annual International Conference on High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2148600.2148635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"20th Annual International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2148600.2148635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

随着处理器内核数量增加到数千个,在超级计算机上使用的科学应用程序中,低效的并行I/O被认为是一个主要瓶颈。我们之前的经验表明,依赖MPI-IO的并行I/O库(如HDF5)不能很好地扩展到超过10K处理器内核,特别是在具有单点资源争用的并行文件系统(如Lustre)上。我们之前对大规模并行多相和多组件地下模拟器(PFLOTRAN)的优化工作导致了应用程序级别的两阶段I/O方法,其中一组指定进程通过将I/O操作分为通信阶段和磁盘I/O阶段来参与I/O过程。通过将MPI全局通信器拆分为多个子通信器来创建指定的I/O进程。每个子通信器中的根进程负责执行整个组的I/O操作,然后将数据分发给组的其余部分。在ORNL Jaguar超级计算机上,这种方法使HDF I/O读取性能提高了25倍以上,PFLOTRAN写入性能提高了3倍以上。本研究描述了一个名为Scorpio的通用并行I/O库的设计和开发,该库集成了我们优化的两阶段I/O方法。该库为用户提供了一个简化的高级抽象,它位于现有的并行I/O库(如HDF5)之上,并实现了优化的I/O访问模式,可以在更多的处理器上扩展。标准基准测试问题和PFLOTRAN的性能结果表明,我们的库能够保持与以前相同的速度,并增加了适用于更广泛的I/O密集型应用程序的灵活性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SCORPIO: A scalable two-phase parallel I/O library with application to a large scale subsurface simulator
Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library called Scorpio that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信