SCORPIO:一个可扩展的两相并行I/O库，可应用于大规模地下模拟器

20th Annual International Conference on High Performance Computing Pub Date : 2011-11-12 DOI:10.1145/2148600.2148635

S. Sreepathi, Vamsi Sripathiy, R. Mills, Glenn Hammondz, G. Mahinthakumar

{"title":"SCORPIO:一个可扩展的两相并行I/O库，可应用于大规模地下模拟器","authors":"S. Sreepathi, Vamsi Sripathiy, R. Mills, Glenn Hammondz, G. Mahinthakumar","doi":"10.1145/2148600.2148635","DOIUrl":null,"url":null,"abstract":"Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library called Scorpio that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.","PeriodicalId":206307,"journal":{"name":"20th Annual International Conference on High Performance Computing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"SCORPIO: A scalable two-phase parallel I/O library with application to a large scale subsurface simulator\",\"authors\":\"S. Sreepathi, Vamsi Sripathiy, R. Mills, Glenn Hammondz, G. Mahinthakumar\",\"doi\":\"10.1145/2148600.2148635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library called Scorpio that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.\",\"PeriodicalId\":206307,\"journal\":{\"name\":\"20th Annual International Conference on High Performance Computing\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"20th Annual International Conference on High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2148600.2148635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"20th Annual International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2148600.2148635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

随着处理器内核数量增加到数千个，在超级计算机上使用的科学应用程序中，低效的并行I/O被认为是一个主要瓶颈。我们之前的经验表明，依赖MPI-IO的并行I/O库(如HDF5)不能很好地扩展到超过10K处理器内核，特别是在具有单点资源争用的并行文件系统(如Lustre)上。我们之前对大规模并行多相和多组件地下模拟器(PFLOTRAN)的优化工作导致了应用程序级别的两阶段I/O方法，其中一组指定进程通过将I/O操作分为通信阶段和磁盘I/O阶段来参与I/O过程。通过将MPI全局通信器拆分为多个子通信器来创建指定的I/O进程。每个子通信器中的根进程负责执行整个组的I/O操作，然后将数据分发给组的其余部分。在ORNL Jaguar超级计算机上，这种方法使HDF I/O读取性能提高了25倍以上，PFLOTRAN写入性能提高了3倍以上。本研究描述了一个名为Scorpio的通用并行I/O库的设计和开发，该库集成了我们优化的两阶段I/O方法。该库为用户提供了一个简化的高级抽象，它位于现有的并行I/O库(如HDF5)之上，并实现了优化的I/O访问模式，可以在更多的处理器上扩展。标准基准测试问题和PFLOTRAN的性能结果表明，我们的库能够保持与以前相同的速度，并增加了适用于更广泛的I/O密集型应用程序的灵活性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SCORPIO: A scalable two-phase parallel I/O library with application to a large scale subsurface simulator

Inefficient parallel I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that parallel I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on parallel file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively parallel multi-phase and multi-component subsurface simulator (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose parallel I/O library called Scorpio that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing parallel I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

20th Annual International Conference on High Performance Computing

自引率

0.00%

发文量