Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW) Pub Date : 2019-11-01 DOI:10.1109/PDSW49588.2019.00007

Megha Agarwal, Divyansh Singhvi, Preeti Malakar, S. Byna

{"title":"Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance","authors":"Megha Agarwal, Divyansh Singhvi, Preeti Malakar, S. Byna","doi":"10.1109/PDSW49588.2019.00007","DOIUrl":null,"url":null,"abstract":"Parallel I/O is an indispensable part of scientific applications. The current stack of parallel I/O contains many tunable parameters. While changing these parameters can increase I/O performance many-fold, the application developers usually resort to default values because tuning is a cumbersome process and requires expertise. We propose two auto-tuning models, based on active learning that recommend a good set of parameter values (currently tested with Lustre parameters and MPI-IO hints) for an application on a given system. These models use Bayesian optimization to find the values of parameters by minimizing an objective function. The first model runs the application to determine these values, whereas, the second model uses an I/O prediction model for the same. Thus the training time is significantly reduced in comparison to the first model (e.g., from 800 seconds to 18 seconds). Also both the models provide flexibility to focus on improvement of either read or write performance. To keep the tuning process generic, we have focused on both read and write performance. We have validated our models using an I/O benchmark (IOR) and 3 scientific application I/O kernels (S3D-IO, BT-IO and GenericIO) on two supercomputers (HPC2010 and Cori). Using the two models, we achieve an increase in I/O bandwidth of up to 11× over the default parameters. We got up to 3× improvements for 37 TB writes, corresponding to 1 billion particles in GenericIO. We also achieved up to 3.2× higher bandwidth for 4.8 TB of noncontiguous I/O in BT-IO benchmark.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW49588.2019.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Parallel I/O is an indispensable part of scientific applications. The current stack of parallel I/O contains many tunable parameters. While changing these parameters can increase I/O performance many-fold, the application developers usually resort to default values because tuning is a cumbersome process and requires expertise. We propose two auto-tuning models, based on active learning that recommend a good set of parameter values (currently tested with Lustre parameters and MPI-IO hints) for an application on a given system. These models use Bayesian optimization to find the values of parameters by minimizing an objective function. The first model runs the application to determine these values, whereas, the second model uses an I/O prediction model for the same. Thus the training time is significantly reduced in comparison to the first model (e.g., from 800 seconds to 18 seconds). Also both the models provide flexibility to focus on improvement of either read or write performance. To keep the tuning process generic, we have focused on both read and write performance. We have validated our models using an I/O benchmark (IOR) and 3 scientific application I/O kernels (S3D-IO, BT-IO and GenericIO) on two supercomputers (HPC2010 and Cori). Using the two models, we achieve an increase in I/O bandwidth of up to 11× over the default parameters. We got up to 3× improvements for 37 TB writes, corresponding to 1 billion particles in GenericIO. We also achieved up to 3.2× higher bandwidth for 4.8 TB of noncontiguous I/O in BT-IO benchmark.

查看原文本刊更多论文

基于主动学习的并行I/O性能自动调优与预测

并行I/O是科学应用中不可缺少的一部分。当前并行I/O堆栈包含许多可调参数。虽然更改这些参数可以将I/O性能提高许多倍，但应用程序开发人员通常使用默认值，因为调优是一个繁琐的过程，需要专业知识。我们提出了两种基于主动学习的自动调优模型，为给定系统上的应用程序推荐一组良好的参数值(目前使用Lustre参数和MPI-IO提示进行了测试)。这些模型使用贝叶斯优化通过最小化目标函数来找到参数的值。第一个模型运行应用程序来确定这些值，而第二个模型使用相同的I/O预测模型。因此，与第一个模型相比，训练时间大大减少(例如，从800秒减少到18秒)。此外，这两种模型都提供了专注于提高读或写性能的灵活性。为了保持调优过程的通用性，我们将重点放在读和写性能上。我们在两台超级计算机(HPC2010和Cori)上使用I/O基准测试(IOR)和3个科学应用I/O内核(ssd - io, BT-IO和GenericIO)验证了我们的模型。使用这两种模型，我们实现了I/O带宽比默认参数增加了11倍。对于37tb的写入，我们得到了3倍的改进，相当于GenericIO中的10亿个粒子。在BT-IO基准测试中，我们还为4.8 TB的不连续I/O实现了高达3.2倍的高带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)

自引率

0.00%

发文量