Predicting How CNN Training Time Changes on Various Mini-Batch Sizes by Considering Convolution Algorithms and Non-GPU Time

Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy Pub Date : 2021-06-25 DOI:10.1145/3452412.3462750

Peter Bryzgalov, T. Maeda, Yutaro Shigeto

{"title":"Predicting How CNN Training Time Changes on Various Mini-Batch Sizes by Considering Convolution Algorithms and Non-GPU Time","authors":"Peter Bryzgalov, T. Maeda, Yutaro Shigeto","doi":"10.1145/3452412.3462750","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) drive successful machine learning applications in a growing number of areas. However, training a CNN may take a massive amount of time and expensive high-end GPU resources. CNN training time may change significantly depending on training parameters and GPU type. Therefore, an accurate estimation of CNN training time can help in selecting training parameters and GPU type, which minimise training time and cost. We focus on one training parameter, which has a particularly significant effect on the training time-the mini-batch size. Predicting CNN training time on a wide range of mini-batch sizes is challenging because a small variation in a mini-batch size can change the selection of convolution algorithms and cause abrupt changes in training time, which is also affected by non-GPU operations. This paper shows our approach to predicting CNN training time over a wide range of mini-batch sizes by utilising a proxy application to benchmark convolutional and dense layers and considering non-GPU time. In contrast to prior works, which build one prediction model for all possible CNN configurations, we build simple models that would each make highly accurate predictions for one particular CNN. We evaluate our approach using several CNN samples and GPU types and demonstrate that it can yield highly accurate predictions on unseen mini-batch sizes with a mean percentage error averaged over all experiments equal to 1.38% (the minimum is 0.21% and the maximum is 5.01%).","PeriodicalId":342766,"journal":{"name":"Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452412.3462750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Convolutional neural networks (CNN) drive successful machine learning applications in a growing number of areas. However, training a CNN may take a massive amount of time and expensive high-end GPU resources. CNN training time may change significantly depending on training parameters and GPU type. Therefore, an accurate estimation of CNN training time can help in selecting training parameters and GPU type, which minimise training time and cost. We focus on one training parameter, which has a particularly significant effect on the training time-the mini-batch size. Predicting CNN training time on a wide range of mini-batch sizes is challenging because a small variation in a mini-batch size can change the selection of convolution algorithms and cause abrupt changes in training time, which is also affected by non-GPU operations. This paper shows our approach to predicting CNN training time over a wide range of mini-batch sizes by utilising a proxy application to benchmark convolutional and dense layers and considering non-GPU time. In contrast to prior works, which build one prediction model for all possible CNN configurations, we build simple models that would each make highly accurate predictions for one particular CNN. We evaluate our approach using several CNN samples and GPU types and demonstrate that it can yield highly accurate predictions on unseen mini-batch sizes with a mean percentage error averaged over all experiments equal to 1.38% (the minimum is 0.21% and the maximum is 5.01%).

查看原文本刊更多论文

通过考虑卷积算法和非gpu时间来预测CNN训练时间在不同小批量大小下的变化

卷积神经网络(CNN)在越来越多的领域推动了成功的机器学习应用。然而，训练CNN可能需要大量的时间和昂贵的高端GPU资源。CNN的训练时间可能会随着训练参数和GPU类型的不同而有很大的变化。因此，准确估计CNN的训练时间有助于选择训练参数和GPU类型，从而最大限度地减少训练时间和成本。我们关注一个对训练时间影响特别显著的训练参数——小批大小。在广泛的小批大小范围内预测CNN的训练时间是具有挑战性的，因为小批大小的微小变化会改变卷积算法的选择并导致训练时间的突然变化，这也会受到非gpu操作的影响。本文展示了我们通过使用代理应用程序对卷积层和密集层进行基准测试并考虑非gpu时间来预测大范围小批量大小的CNN训练时间的方法。与之前的工作(为所有可能的CNN配置构建一个预测模型)相比，我们构建了简单的模型，每个模型都可以对一个特定的CNN进行高度准确的预测。我们使用几个CNN样本和GPU类型来评估我们的方法，并证明它可以在未见过的小批大小上产生高度准确的预测，所有实验的平均百分比误差等于1.38%(最小值为0.21%，最大值为5.01%)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy

自引率

0.00%

发文量