Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance

Megha Agarwal, Divyansh Singhvi, Preeti Malakar, S. Byna
{"title":"Active Learning-based Automatic Tuning and Prediction of Parallel I/O Performance","authors":"Megha Agarwal, Divyansh Singhvi, Preeti Malakar, S. Byna","doi":"10.1109/PDSW49588.2019.00007","DOIUrl":null,"url":null,"abstract":"Parallel I/O is an indispensable part of scientific applications. The current stack of parallel I/O contains many tunable parameters. While changing these parameters can increase I/O performance many-fold, the application developers usually resort to default values because tuning is a cumbersome process and requires expertise. We propose two auto-tuning models, based on active learning that recommend a good set of parameter values (currently tested with Lustre parameters and MPI-IO hints) for an application on a given system. These models use Bayesian optimization to find the values of parameters by minimizing an objective function. The first model runs the application to determine these values, whereas, the second model uses an I/O prediction model for the same. Thus the training time is significantly reduced in comparison to the first model (e.g., from 800 seconds to 18 seconds). Also both the models provide flexibility to focus on improvement of either read or write performance. To keep the tuning process generic, we have focused on both read and write performance. We have validated our models using an I/O benchmark (IOR) and 3 scientific application I/O kernels (S3D-IO, BT-IO and GenericIO) on two supercomputers (HPC2010 and Cori). Using the two models, we achieve an increase in I/O bandwidth of up to 11× over the default parameters. We got up to 3× improvements for 37 TB writes, corresponding to 1 billion particles in GenericIO. We also achieved up to 3.2× higher bandwidth for 4.8 TB of noncontiguous I/O in BT-IO benchmark.","PeriodicalId":130430,"journal":{"name":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","volume":"178 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM Fourth International Parallel Data Systems Workshop (PDSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW49588.2019.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Parallel I/O is an indispensable part of scientific applications. The current stack of parallel I/O contains many tunable parameters. While changing these parameters can increase I/O performance many-fold, the application developers usually resort to default values because tuning is a cumbersome process and requires expertise. We propose two auto-tuning models, based on active learning that recommend a good set of parameter values (currently tested with Lustre parameters and MPI-IO hints) for an application on a given system. These models use Bayesian optimization to find the values of parameters by minimizing an objective function. The first model runs the application to determine these values, whereas, the second model uses an I/O prediction model for the same. Thus the training time is significantly reduced in comparison to the first model (e.g., from 800 seconds to 18 seconds). Also both the models provide flexibility to focus on improvement of either read or write performance. To keep the tuning process generic, we have focused on both read and write performance. We have validated our models using an I/O benchmark (IOR) and 3 scientific application I/O kernels (S3D-IO, BT-IO and GenericIO) on two supercomputers (HPC2010 and Cori). Using the two models, we achieve an increase in I/O bandwidth of up to 11× over the default parameters. We got up to 3× improvements for 37 TB writes, corresponding to 1 billion particles in GenericIO. We also achieved up to 3.2× higher bandwidth for 4.8 TB of noncontiguous I/O in BT-IO benchmark.
基于主动学习的并行I/O性能自动调优与预测
并行I/O是科学应用中不可缺少的一部分。当前并行I/O堆栈包含许多可调参数。虽然更改这些参数可以将I/O性能提高许多倍,但应用程序开发人员通常使用默认值,因为调优是一个繁琐的过程,需要专业知识。我们提出了两种基于主动学习的自动调优模型,为给定系统上的应用程序推荐一组良好的参数值(目前使用Lustre参数和MPI-IO提示进行了测试)。这些模型使用贝叶斯优化通过最小化目标函数来找到参数的值。第一个模型运行应用程序来确定这些值,而第二个模型使用相同的I/O预测模型。因此,与第一个模型相比,训练时间大大减少(例如,从800秒减少到18秒)。此外,这两种模型都提供了专注于提高读或写性能的灵活性。为了保持调优过程的通用性,我们将重点放在读和写性能上。我们在两台超级计算机(HPC2010和Cori)上使用I/O基准测试(IOR)和3个科学应用I/O内核(ssd - io, BT-IO和GenericIO)验证了我们的模型。使用这两种模型,我们实现了I/O带宽比默认参数增加了11倍。对于37tb的写入,我们得到了3倍的改进,相当于GenericIO中的10亿个粒子。在BT-IO基准测试中,我们还为4.8 TB的不连续I/O实现了高达3.2倍的高带宽。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信