Design strategies and approximation methods for high-performance computing variability management

IF 2.2 2区工程技术 Q2 ENGINEERING, INDUSTRIAL

Journal of Quality Technology Pub Date : 2022-01-24 DOI:10.1080/00224065.2022.2035285

Yueyao Wang, Li Xu, Yili Hong, Rong Pan, Tyler H. Chang, T. Lux, Jon Bernard, L. Watson, K. Cameron

{"title":"Design strategies and approximation methods for high-performance computing variability management","authors":"Yueyao Wang, Li Xu, Yili Hong, Rong Pan, Tyler H. Chang, T. Lux, Jon Bernard, L. Watson, K. Cameron","doi":"10.1080/00224065.2022.2035285","DOIUrl":null,"url":null,"abstract":"Abstract Performance variability management is an active research area in high-performance computing (HPC). In this article, we focus on input/output (I/O) variability, which is a complicated function that is affected by many system factors. To study the performance variability, computer scientists often use grid-based designs (GBDs) which are equivalent to full factorial designs to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models, as deterministic methods, could be biased particularly if extrapolations are needed. In statistics literature, space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models. The applicability of SFDs and surrogates in the HPC variability management setting, however, needs investigation. In this case study, we investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability. We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability. In our synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to the choice of approximation models, GP is recommended if the data are collected by SFDs. If data are collected using GBDs, both GP and Delaunay can be considered. With the best choice of approximation method, the performance of SFDs and GBD depends on the property of the underlying surface. For the cases in which SFDs perform better, the number of design points needed for SFDs is about half of or less than that of the GBD to achieve the same prediction accuracy. Although we observe that the GBD can also outperform SFDs for smooth underlying surface, GBD is not scalable to high dimensional experimental regions. Therefore, SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model. This article has online supplementary materials.","PeriodicalId":54769,"journal":{"name":"Journal of Quality Technology","volume":"39 1","pages":"88 - 103"},"PeriodicalIF":2.2000,"publicationDate":"2022-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality Technology","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1080/00224065.2022.2035285","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 2

Abstract

Abstract Performance variability management is an active research area in high-performance computing (HPC). In this article, we focus on input/output (I/O) variability, which is a complicated function that is affected by many system factors. To study the performance variability, computer scientists often use grid-based designs (GBDs) which are equivalent to full factorial designs to collect I/O variability data, and use mathematical approximation methods to build a prediction model. Mathematical approximation models, as deterministic methods, could be biased particularly if extrapolations are needed. In statistics literature, space-filling designs (SFDs) and surrogate models such as Gaussian process (GP) are popular for data collection and building predictive models. The applicability of SFDs and surrogates in the HPC variability management setting, however, needs investigation. In this case study, we investigate their applicability in the HPC setting in terms of design efficiency, prediction accuracy, and scalability. We first customize the existing SFDs so that they can be applied in the HPC setting. We conduct a comprehensive investigation of design strategies and the prediction ability of approximation methods. We use both synthetic data simulated from three test functions and the real data from the HPC setting. We then compare different methods in terms of design efficiency, prediction accuracy, and scalability. In our synthetic and real data analysis, GP with SFDs outperforms in most scenarios. With respect to the choice of approximation models, GP is recommended if the data are collected by SFDs. If data are collected using GBDs, both GP and Delaunay can be considered. With the best choice of approximation method, the performance of SFDs and GBD depends on the property of the underlying surface. For the cases in which SFDs perform better, the number of design points needed for SFDs is about half of or less than that of the GBD to achieve the same prediction accuracy. Although we observe that the GBD can also outperform SFDs for smooth underlying surface, GBD is not scalable to high dimensional experimental regions. Therefore, SFDs that can be tailored to high dimension and non-smooth surface are recommended especially when large numbers of input factors need to be considered in the model. This article has online supplementary materials.

查看原文本刊更多论文

高性能计算可变性管理的设计策略与近似方法

性能可变性管理是高性能计算领域的一个活跃研究领域。在本文中，我们将重点讨论输入/输出(I/O)可变性，这是一个受许多系统因素影响的复杂函数。为了研究性能变异性，计算机科学家通常使用相当于全因子设计的基于网格的设计(GBDs)来收集I/O变异性数据，并使用数学近似方法建立预测模型。数学近似模型作为确定性方法，尤其在需要外推时可能存在偏差。在统计文献中，空间填充设计(SFDs)和替代模型(如高斯过程(GP))是数据收集和建立预测模型的常用方法。然而，SFDs和替代品在HPC可变性管理环境中的适用性需要调查。在本案例研究中，我们从设计效率、预测准确性和可扩展性方面研究了它们在高性能计算环境中的适用性。我们首先定制现有的sfd，以便它们可以应用于HPC设置。我们对近似方法的设计策略和预测能力进行了全面的研究。我们使用了三个测试函数模拟的合成数据和HPC设置的真实数据。然后，我们在设计效率、预测精度和可扩展性方面比较了不同的方法。在我们的合成和真实数据分析中，带有sfd的GP在大多数情况下都表现出色。关于近似模型的选择，如果数据是由SFDs收集的，建议使用GP。如果使用GBDs收集数据，GP和Delaunay都可以考虑。在选择最佳近似方法时，sfd和GBD的性能取决于下垫面的性质。在sfd表现较好的情况下，sfd所需的设计点数量约为GBD的一半或更少，以达到相同的预测精度。虽然我们观察到GBD在光滑下垫面上也优于sfd，但GBD不能扩展到高维实验区域。因此，在模型中需要考虑大量输入因素的情况下，建议使用能够适应高维、非光滑表面的sfd。这篇文章有在线补充资料。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Quality Technology 管理科学-工程：工业

CiteScore

5.20

自引率

4.00%

发文量

审稿时长

>12 weeks

期刊介绍： The objective of Journal of Quality Technology is to contribute to the technical advancement of the field of quality technology by publishing papers that emphasize the practical applicability of new techniques, instructive examples of the operation of existing techniques and results of historical researches. Expository, review, and tutorial papers are also acceptable if they are written in a style suitable for practicing engineers. Sample our Mathematics & Statistics journals, sign in here to start your FREE access for 14 days