Predicting How CNN Training Time Changes on Various Mini-Batch Sizes by Considering Convolution Algorithms and Non-GPU Time

Peter Bryzgalov, T. Maeda, Yutaro Shigeto
{"title":"Predicting How CNN Training Time Changes on Various Mini-Batch Sizes by Considering Convolution Algorithms and Non-GPU Time","authors":"Peter Bryzgalov, T. Maeda, Yutaro Shigeto","doi":"10.1145/3452412.3462750","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNN) drive successful machine learning applications in a growing number of areas. However, training a CNN may take a massive amount of time and expensive high-end GPU resources. CNN training time may change significantly depending on training parameters and GPU type. Therefore, an accurate estimation of CNN training time can help in selecting training parameters and GPU type, which minimise training time and cost. We focus on one training parameter, which has a particularly significant effect on the training time-the mini-batch size. Predicting CNN training time on a wide range of mini-batch sizes is challenging because a small variation in a mini-batch size can change the selection of convolution algorithms and cause abrupt changes in training time, which is also affected by non-GPU operations. This paper shows our approach to predicting CNN training time over a wide range of mini-batch sizes by utilising a proxy application to benchmark convolutional and dense layers and considering non-GPU time. In contrast to prior works, which build one prediction model for all possible CNN configurations, we build simple models that would each make highly accurate predictions for one particular CNN. We evaluate our approach using several CNN samples and GPU types and demonstrate that it can yield highly accurate predictions on unseen mini-batch sizes with a mean percentage error averaged over all experiments equal to 1.38% (the minimum is 0.21% and the maximum is 5.01%).","PeriodicalId":342766,"journal":{"name":"Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452412.3462750","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Convolutional neural networks (CNN) drive successful machine learning applications in a growing number of areas. However, training a CNN may take a massive amount of time and expensive high-end GPU resources. CNN training time may change significantly depending on training parameters and GPU type. Therefore, an accurate estimation of CNN training time can help in selecting training parameters and GPU type, which minimise training time and cost. We focus on one training parameter, which has a particularly significant effect on the training time-the mini-batch size. Predicting CNN training time on a wide range of mini-batch sizes is challenging because a small variation in a mini-batch size can change the selection of convolution algorithms and cause abrupt changes in training time, which is also affected by non-GPU operations. This paper shows our approach to predicting CNN training time over a wide range of mini-batch sizes by utilising a proxy application to benchmark convolutional and dense layers and considering non-GPU time. In contrast to prior works, which build one prediction model for all possible CNN configurations, we build simple models that would each make highly accurate predictions for one particular CNN. We evaluate our approach using several CNN samples and GPU types and demonstrate that it can yield highly accurate predictions on unseen mini-batch sizes with a mean percentage error averaged over all experiments equal to 1.38% (the minimum is 0.21% and the maximum is 5.01%).
通过考虑卷积算法和非gpu时间来预测CNN训练时间在不同小批量大小下的变化
卷积神经网络(CNN)在越来越多的领域推动了成功的机器学习应用。然而,训练CNN可能需要大量的时间和昂贵的高端GPU资源。CNN的训练时间可能会随着训练参数和GPU类型的不同而有很大的变化。因此,准确估计CNN的训练时间有助于选择训练参数和GPU类型,从而最大限度地减少训练时间和成本。我们关注一个对训练时间影响特别显著的训练参数——小批大小。在广泛的小批大小范围内预测CNN的训练时间是具有挑战性的,因为小批大小的微小变化会改变卷积算法的选择并导致训练时间的突然变化,这也会受到非gpu操作的影响。本文展示了我们通过使用代理应用程序对卷积层和密集层进行基准测试并考虑非gpu时间来预测大范围小批量大小的CNN训练时间的方法。与之前的工作(为所有可能的CNN配置构建一个预测模型)相比,我们构建了简单的模型,每个模型都可以对一个特定的CNN进行高度准确的预测。我们使用几个CNN样本和GPU类型来评估我们的方法,并证明它可以在未见过的小批大小上产生高度准确的预测,所有实验的平均百分比误差等于1.38%(最小值为0.21%,最大值为5.01%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信