A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems

2021 Workshop on Exascale MPI (ExaMPI) Pub Date : 2021-11-01 DOI:10.1109/ExaMPI54564.2021.00010

Michael Wilkins, Yanfei Guo, R. Thakur, N. Hardavellas, P. Dinda, Min Si

{"title":"A FACT-based Approach: Making Machine Learning Collective Autotuning Feasible on Exascale Systems","authors":"Michael Wilkins, Yanfei Guo, R. Thakur, N. Hardavellas, P. Dinda, Min Si","doi":"10.1109/ExaMPI54564.2021.00010","DOIUrl":null,"url":null,"abstract":"According to recent performance analyses, MPI collective operations make up a quarter of the execution time on production systems. Machine learning (ML) autotuners use supervised learning to select collective algorithms, significantly improving collective performance. However, we observe two barriers preventing their adoption over the default heuristic-based autotuners. First, a user may find it difficult to compare autotuners because we lack a methodology to quantify their performance. We call this the performance quantification challenge. Second, to obtain the advertised performance, ML model training requires benchmark data from a vast majority of the feature space. Collecting such data regularly on large scale systems consumes far too much time and resources, and this will only get worse with exascale systems. We refer to this as the training data collection challenge. To address these challenges, we contribute (1) a performance evaluation framework to compare and improve collective au-totuner designs and (2) the Feature scaling, Active learning, Converge, Tune hyperparameters (FACT) approach, a three-part methodology to minimize the training data collection time (and thus maximize practicality at larger scale) without sacrificing accuracy. In the methodology, we first preprocess feature and output values based on domain knowledge. Then, we use active learning to iteratively collect only necessary training data points. Lastly, we perform hyperparameter tuning to further improve model accuracy without any additional data. On a production scale system, our methodology produces a model of equal accuracy using 6.88x less training data collection time.","PeriodicalId":222289,"journal":{"name":"2021 Workshop on Exascale MPI (ExaMPI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Workshop on Exascale MPI (ExaMPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ExaMPI54564.2021.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

According to recent performance analyses, MPI collective operations make up a quarter of the execution time on production systems. Machine learning (ML) autotuners use supervised learning to select collective algorithms, significantly improving collective performance. However, we observe two barriers preventing their adoption over the default heuristic-based autotuners. First, a user may find it difficult to compare autotuners because we lack a methodology to quantify their performance. We call this the performance quantification challenge. Second, to obtain the advertised performance, ML model training requires benchmark data from a vast majority of the feature space. Collecting such data regularly on large scale systems consumes far too much time and resources, and this will only get worse with exascale systems. We refer to this as the training data collection challenge. To address these challenges, we contribute (1) a performance evaluation framework to compare and improve collective au-totuner designs and (2) the Feature scaling, Active learning, Converge, Tune hyperparameters (FACT) approach, a three-part methodology to minimize the training data collection time (and thus maximize practicality at larger scale) without sacrificing accuracy. In the methodology, we first preprocess feature and output values based on domain knowledge. Then, we use active learning to iteratively collect only necessary training data points. Lastly, we perform hyperparameter tuning to further improve model accuracy without any additional data. On a production scale system, our methodology produces a model of equal accuracy using 6.88x less training data collection time.

查看原文本刊更多论文

基于事实的方法:使机器学习集体自动调谐在百亿亿级系统上可行

根据最近的性能分析，MPI集合操作占生产系统执行时间的四分之一。机器学习(ML)自动调谐器使用监督学习来选择集体算法，显着提高集体性能。然而，我们观察到有两个障碍阻碍了它们在默认的基于启发式的自动调谐器上的采用。首先，用户可能会发现很难比较自动调谐器，因为我们缺乏量化其性能的方法。我们称之为绩效量化挑战。其次，为了获得宣传的性能，ML模型训练需要来自绝大多数特征空间的基准数据。定期在大型系统上收集此类数据消耗了太多的时间和资源，对于百亿亿级系统来说，这种情况只会变得更糟。我们将其称为训练数据收集挑战。为了应对这些挑战，我们贡献了(1)一个性能评估框架，用于比较和改进集体自调谐器设计;(2)特征缩放、主动学习、收敛、调谐超参数(FACT)方法，这是一种三部分方法，可在不牺牲准确性的情况下最大限度地减少训练数据收集时间(从而在更大规模上最大化实用性)。在方法中，我们首先基于领域知识对特征和输出值进行预处理。然后，我们使用主动学习迭代地只收集必要的训练数据点。最后，我们在没有任何额外数据的情况下执行超参数调优以进一步提高模型精度。在生产规模系统中，我们的方法产生了相同精度的模型，使用的训练数据收集时间减少了6.88倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 Workshop on Exascale MPI (ExaMPI)

自引率

0.00%

发文量