Improving Collective MPI-IO Using Topology-Aware Stepwise Data Aggregation with I/O Throttling

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2018-01-28 DOI:10.1145/3149457.3149464

Y. Tsujita, A. Hori, Toyohisa Kameyama, Atsuya Uno, F. Shoji, Y. Ishikawa

{"title":"Improving Collective MPI-IO Using Topology-Aware Stepwise Data Aggregation with I/O Throttling","authors":"Y. Tsujita, A. Hori, Toyohisa Kameyama, Atsuya Uno, F. Shoji, Y. Ishikawa","doi":"10.1145/3149457.3149464","DOIUrl":null,"url":null,"abstract":"MPI-IO has been used in an internal I/O interface layer of HDF5 or PnetCDF, where collective MPI-IO plays a big role in parallel I/O to manage a huge scale of scientific data. However, existing collective MPI-IO optimization named two-phase I/O has not been tuned enough for recent supercomputers consisting of mesh/torus interconnects and a huge scale of parallel file systems due to lack of topology-awareness in data transfers and optimization for parallel file systems. In this paper, we propose I/O throttling and topology-aware stepwise data aggregation in two-phase I/O of ROMIO, which is a representative MPI-IO library, in order to improve collective MPI-IO performance even if we have multiple processes per compute node. Throttling I/O requests going to a target file system mitigates I/O request contention, and consequently I/O performance improvements are achieved in file access phase of two-phase I/O. Topology-aware aggregator layout with paying attention to multiple aggregators per compute node alleviates contention in data aggregation phase of two-phase I/O. In addition, stepwise data aggregation improves data aggregation performance. HPIO benchmark results on the K computer indicate that the proposed optimization has achieved up to about 73% and 39% improvements in write performance compared with the original implementation using 12,288 and 24,576 processes on 3,072 and 6,144 compute nodes, respectively.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3149457.3149464","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

MPI-IO has been used in an internal I/O interface layer of HDF5 or PnetCDF, where collective MPI-IO plays a big role in parallel I/O to manage a huge scale of scientific data. However, existing collective MPI-IO optimization named two-phase I/O has not been tuned enough for recent supercomputers consisting of mesh/torus interconnects and a huge scale of parallel file systems due to lack of topology-awareness in data transfers and optimization for parallel file systems. In this paper, we propose I/O throttling and topology-aware stepwise data aggregation in two-phase I/O of ROMIO, which is a representative MPI-IO library, in order to improve collective MPI-IO performance even if we have multiple processes per compute node. Throttling I/O requests going to a target file system mitigates I/O request contention, and consequently I/O performance improvements are achieved in file access phase of two-phase I/O. Topology-aware aggregator layout with paying attention to multiple aggregators per compute node alleviates contention in data aggregation phase of two-phase I/O. In addition, stepwise data aggregation improves data aggregation performance. HPIO benchmark results on the K computer indicate that the proposed optimization has achieved up to about 73% and 39% improvements in write performance compared with the original implementation using 12,288 and 24,576 processes on 3,072 and 6,144 compute nodes, respectively.

查看原文本刊更多论文

基于I/O节流的拓扑感知逐步数据聚合改进MPI-IO

MPI-IO已被用于HDF5或PnetCDF的内部I/O接口层，其中集体MPI-IO在并行I/O中发挥重要作用，以管理大规模的科学数据。然而，由于在数据传输和并行文件系统的优化中缺乏拓扑意识，现有的两阶段I/O集体MPI-IO优化对于由网格/环面互连和大规模并行文件系统组成的超级计算机还没有进行足够的调整。本文提出了一种具有代表性的MPI-IO库ROMIO的两阶段I/O节流和拓扑感知的逐步数据聚合方法，以便在每个计算节点有多个进程的情况下提高MPI-IO的总体性能。限制发送到目标文件系统的I/O请求可以减轻I/O请求争用，从而在两阶段I/O的文件访问阶段实现I/O性能改进。拓扑感知的聚合器布局通过关注每个计算节点的多个聚合器，缓解了两阶段I/O数据聚合阶段的争用问题。此外，分步数据聚合可以提高数据聚合的性能。在K计算机上的HPIO基准测试结果表明，与在3,072和6,144个计算节点上分别使用12,288和24,576个进程的原始实现相比，所提出的优化在写性能方面分别实现了约73%和39%的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量