动态模型驱动并行I/O性能调优

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI:10.1109/CLUSTER.2015.37

Babak Behzad, S. Byna, Stefan M. Wild, Prabhat, M. Snir

{"title":"动态模型驱动并行I/O性能调优","authors":"Babak Behzad, S. Byna, Stefan M. Wild, Prabhat, M. Snir","doi":"10.1109/CLUSTER.2015.37","DOIUrl":null,"url":null,"abstract":"Parallel I/O performance depends highly on the interactions among multiple layers of the parallel I/O stack. The most common layers include high-level I/O libraries, MPI-IO middleware, and parallel file system. Each of these layers offers various tunable parameters to control intermediary data transfer points and the final data layout. Due to the interdependencies and the number of combinations of parameters, finding a good set of parameter values for a specific application's I/O pattern is challenging. Recent efforts, such as autotuning with genetic algorithms (GAs) and analytical models, have several limitations. For instance, analytical models fail to capture the dynamic nature of shared supercomputing systems and are application-specific. GA-based tuning requires running many time-consuming experiments for each input size. In this paper, we present a strategy to generate automatically an empirical model for a given application pattern. Using a set of real measurements from running an I/O kernel as training set, we generate a nonlinear regression model. We use this model to predict the top-20 tunable parameter values that give efficient I/O performance and rerun the I/O kernel to select the best set of parameter under the current conditions as tunable parameters for future runs of the same I/O kernel. Using this approach, we demonstrate 6X - 94X speedup over default I/O time for different I/O kernels running on multiple HPC systems. We also evaluate performance by identifying interdependencies among different sets of tunable parameters.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Dynamic Model-Driven Parallel I/O Performance Tuning\",\"authors\":\"Babak Behzad, S. Byna, Stefan M. Wild, Prabhat, M. Snir\",\"doi\":\"10.1109/CLUSTER.2015.37\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel I/O performance depends highly on the interactions among multiple layers of the parallel I/O stack. The most common layers include high-level I/O libraries, MPI-IO middleware, and parallel file system. Each of these layers offers various tunable parameters to control intermediary data transfer points and the final data layout. Due to the interdependencies and the number of combinations of parameters, finding a good set of parameter values for a specific application's I/O pattern is challenging. Recent efforts, such as autotuning with genetic algorithms (GAs) and analytical models, have several limitations. For instance, analytical models fail to capture the dynamic nature of shared supercomputing systems and are application-specific. GA-based tuning requires running many time-consuming experiments for each input size. In this paper, we present a strategy to generate automatically an empirical model for a given application pattern. Using a set of real measurements from running an I/O kernel as training set, we generate a nonlinear regression model. We use this model to predict the top-20 tunable parameter values that give efficient I/O performance and rerun the I/O kernel to select the best set of parameter under the current conditions as tunable parameters for future runs of the same I/O kernel. Using this approach, we demonstrate 6X - 94X speedup over default I/O time for different I/O kernels running on multiple HPC systems. We also evaluate performance by identifying interdependencies among different sets of tunable parameters.\",\"PeriodicalId\":187042,\"journal\":{\"name\":\"2015 IEEE International Conference on Cluster Computing\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTER.2015.37\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2015.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

摘要

并行I/O性能在很大程度上取决于并行I/O堆栈多层之间的相互作用。最常见的层包括高级I/O库、MPI-IO中间件和并行文件系统。这些层中的每一层都提供各种可调参数来控制中间数据传输点和最终数据布局。由于参数的相互依赖性和组合的数量，为特定应用程序的I/O模式找到一组好的参数值是一项挑战。最近的努力，例如使用遗传算法(GAs)和分析模型进行自动调整，有一些局限性。例如，分析模型无法捕捉共享超级计算系统的动态特性，并且是特定于应用程序的。基于ga的调优需要对每个输入大小运行许多耗时的实验。在本文中，我们提出了一种为给定应用模式自动生成经验模型的策略。使用运行I/O内核的一组真实测量作为训练集，我们生成了一个非线性回归模型。我们使用该模型来预测提供高效I/O性能的前20个可调参数值，并重新运行I/O内核，以在当前条件下选择最佳参数集作为将来运行相同I/O内核的可调参数。使用这种方法，我们演示了在多个HPC系统上运行的不同I/O内核比默认I/O时间加快6 - 94X。我们还通过识别不同可调参数集之间的相互依赖性来评估性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic Model-Driven Parallel I/O Performance Tuning

Parallel I/O performance depends highly on the interactions among multiple layers of the parallel I/O stack. The most common layers include high-level I/O libraries, MPI-IO middleware, and parallel file system. Each of these layers offers various tunable parameters to control intermediary data transfer points and the final data layout. Due to the interdependencies and the number of combinations of parameters, finding a good set of parameter values for a specific application's I/O pattern is challenging. Recent efforts, such as autotuning with genetic algorithms (GAs) and analytical models, have several limitations. For instance, analytical models fail to capture the dynamic nature of shared supercomputing systems and are application-specific. GA-based tuning requires running many time-consuming experiments for each input size. In this paper, we present a strategy to generate automatically an empirical model for a given application pattern. Using a set of real measurements from running an I/O kernel as training set, we generate a nonlinear regression model. We use this model to predict the top-20 tunable parameter values that give efficient I/O performance and rerun the I/O kernel to select the best set of parameter under the current conditions as tunable parameters for future runs of the same I/O kernel. Using this approach, we demonstrate 6X - 94X speedup over default I/O time for different I/O kernels running on multiple HPC systems. We also evaluate performance by identifying interdependencies among different sets of tunable parameters.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量