超参数整定和模型优化对深度神经网络性能影响的实证研究

ACM Transactions on Software Engineering and Methodology (TOSEM) Pub Date : 2022-01-31 DOI:10.1145/3506695

Lizhi Liao, Heng Li, Weiyi Shang, L. Ma

{"title":"超参数整定和模型优化对深度神经网络性能影响的实证研究","authors":"Lizhi Liao, Heng Li, Weiyi Shang, L. Ma","doi":"10.1145/3506695","DOIUrl":null,"url":null,"abstract":"Deep neural network (DNN) models typically have many hyperparameters that can be configured to achieve optimal performance on a particular dataset. Practitioners usually tune the hyperparameters of their DNN models by training a number of trial models with different configurations of the hyperparameters, to find the optimal hyperparameter configuration that maximizes the training accuracy or minimizes the training loss. As such hyperparameter tuning usually focuses on the model accuracy or the loss function, it is not clear and remains under-explored how the process impacts other performance properties of DNN models, such as inference latency and model size. On the other hand, standard DNN models are often large in size and computing-intensive, prohibiting them from being directly deployed in resource-bounded environments such as mobile devices and Internet of Things (IoT) devices. To tackle this problem, various model optimization techniques (e.g., pruning or quantization) are proposed to make DNN models smaller and less computing-intensive so that they are better suited for resource-bounded environments. However, it is neither clear how the model optimization techniques impact other performance properties of DNN models such as inference latency and battery consumption, nor how the model optimization techniques impact the effect of hyperparameter tuning (i.e., the compounding effect). Therefore, in this paper, we perform a comprehensive study on four representative and widely-adopted DNN models, i.e., CNN image classification, Resnet-50, CNN text classification, and LSTM sentiment classification, to investigate how different DNN model hyperparameters affect the standard DNN models, as well as how the hyperparameter tuning combined with model optimization affect the optimized DNN models, in terms of various performance properties (e.g., inference latency or battery consumption). Our empirical results indicate that tuning specific hyperparameters has heterogeneous impact on the performance of DNN models across different models and different performance properties. In particular, although the top tuned DNN models usually have very similar accuracy, they may have significantly different performance in terms of other aspects (e.g., inference latency). We also observe that model optimization has a confounding effect on the impact of hyperparameters on DNN model performance. For example, two sets of hyperparameters may result in standard models with similar performance but their performance may become significantly different after they are optimized and deployed on the mobile device. Our findings highlight that practitioners can benefit from paying attention to a variety of performance properties and the confounding effect of model optimization when tuning and optimizing their DNN models.","PeriodicalId":7398,"journal":{"name":"ACM Transactions on Software Engineering and Methodology (TOSEM)","volume":"111 1","pages":"1 - 40"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks\",\"authors\":\"Lizhi Liao, Heng Li, Weiyi Shang, L. Ma\",\"doi\":\"10.1145/3506695\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural network (DNN) models typically have many hyperparameters that can be configured to achieve optimal performance on a particular dataset. Practitioners usually tune the hyperparameters of their DNN models by training a number of trial models with different configurations of the hyperparameters, to find the optimal hyperparameter configuration that maximizes the training accuracy or minimizes the training loss. As such hyperparameter tuning usually focuses on the model accuracy or the loss function, it is not clear and remains under-explored how the process impacts other performance properties of DNN models, such as inference latency and model size. On the other hand, standard DNN models are often large in size and computing-intensive, prohibiting them from being directly deployed in resource-bounded environments such as mobile devices and Internet of Things (IoT) devices. To tackle this problem, various model optimization techniques (e.g., pruning or quantization) are proposed to make DNN models smaller and less computing-intensive so that they are better suited for resource-bounded environments. However, it is neither clear how the model optimization techniques impact other performance properties of DNN models such as inference latency and battery consumption, nor how the model optimization techniques impact the effect of hyperparameter tuning (i.e., the compounding effect). Therefore, in this paper, we perform a comprehensive study on four representative and widely-adopted DNN models, i.e., CNN image classification, Resnet-50, CNN text classification, and LSTM sentiment classification, to investigate how different DNN model hyperparameters affect the standard DNN models, as well as how the hyperparameter tuning combined with model optimization affect the optimized DNN models, in terms of various performance properties (e.g., inference latency or battery consumption). Our empirical results indicate that tuning specific hyperparameters has heterogeneous impact on the performance of DNN models across different models and different performance properties. In particular, although the top tuned DNN models usually have very similar accuracy, they may have significantly different performance in terms of other aspects (e.g., inference latency). We also observe that model optimization has a confounding effect on the impact of hyperparameters on DNN model performance. For example, two sets of hyperparameters may result in standard models with similar performance but their performance may become significantly different after they are optimized and deployed on the mobile device. Our findings highlight that practitioners can benefit from paying attention to a variety of performance properties and the confounding effect of model optimization when tuning and optimizing their DNN models.\",\"PeriodicalId\":7398,\"journal\":{\"name\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"volume\":\"111 1\",\"pages\":\"1 - 40\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Software Engineering and Methodology (TOSEM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3506695\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology (TOSEM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3506695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

深度神经网络(DNN)模型通常具有许多超参数，可以配置这些超参数以在特定数据集上实现最佳性能。从业者通常通过训练具有不同超参数配置的多个试验模型来调整其DNN模型的超参数，以找到最大化训练精度或最小化训练损失的最佳超参数配置。由于这种超参数调优通常侧重于模型精度或损失函数，因此该过程如何影响DNN模型的其他性能属性(如推理延迟和模型大小)尚不清楚，也未得到充分探讨。另一方面，标准DNN模型通常规模较大且计算密集型，因此无法直接部署在移动设备和物联网(IoT)设备等资源有限的环境中。为了解决这个问题，提出了各种模型优化技术(例如，修剪或量化)，使深度神经网络模型更小，计算强度更低，从而更适合资源有限的环境。然而，目前尚不清楚模型优化技术如何影响DNN模型的其他性能属性，如推理延迟和电池消耗，也不清楚模型优化技术如何影响超参数调优的效果(即复合效应)。因此，在本文中，我们对CNN图像分类、Resnet-50、CNN文本分类和LSTM情感分类这四种具有代表性且被广泛采用的深度神经网络模型进行了全面的研究，研究不同的深度神经网络模型超参数对标准深度神经网络模型的影响，以及超参数调优结合模型优化对优化后的深度神经网络模型在各种性能特性(例如:推断延迟或电池消耗)。我们的实证结果表明，调整特定的超参数对不同模型和不同性能属性的DNN模型的性能有不同的影响。特别是，尽管最优DNN模型通常具有非常相似的精度，但它们在其他方面(例如，推理延迟)可能具有显着不同的性能。我们还观察到模型优化对超参数对DNN模型性能的影响具有混淆效应。例如，两组超参数可能会产生性能相似的标准模型，但在移动设备上进行优化部署后，其性能可能会出现明显差异。我们的研究结果强调，从业者在调整和优化DNN模型时，可以通过关注各种性能属性和模型优化的混淆效应而受益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Empirical Study of the Impact of Hyperparameter Tuning and Model Optimization on the Performance Properties of Deep Neural Networks

Deep neural network (DNN) models typically have many hyperparameters that can be configured to achieve optimal performance on a particular dataset. Practitioners usually tune the hyperparameters of their DNN models by training a number of trial models with different configurations of the hyperparameters, to find the optimal hyperparameter configuration that maximizes the training accuracy or minimizes the training loss. As such hyperparameter tuning usually focuses on the model accuracy or the loss function, it is not clear and remains under-explored how the process impacts other performance properties of DNN models, such as inference latency and model size. On the other hand, standard DNN models are often large in size and computing-intensive, prohibiting them from being directly deployed in resource-bounded environments such as mobile devices and Internet of Things (IoT) devices. To tackle this problem, various model optimization techniques (e.g., pruning or quantization) are proposed to make DNN models smaller and less computing-intensive so that they are better suited for resource-bounded environments. However, it is neither clear how the model optimization techniques impact other performance properties of DNN models such as inference latency and battery consumption, nor how the model optimization techniques impact the effect of hyperparameter tuning (i.e., the compounding effect). Therefore, in this paper, we perform a comprehensive study on four representative and widely-adopted DNN models, i.e., CNN image classification, Resnet-50, CNN text classification, and LSTM sentiment classification, to investigate how different DNN model hyperparameters affect the standard DNN models, as well as how the hyperparameter tuning combined with model optimization affect the optimized DNN models, in terms of various performance properties (e.g., inference latency or battery consumption). Our empirical results indicate that tuning specific hyperparameters has heterogeneous impact on the performance of DNN models across different models and different performance properties. In particular, although the top tuned DNN models usually have very similar accuracy, they may have significantly different performance in terms of other aspects (e.g., inference latency). We also observe that model optimization has a confounding effect on the impact of hyperparameters on DNN model performance. For example, two sets of hyperparameters may result in standard models with similar performance but their performance may become significantly different after they are optimized and deployed on the mobile device. Our findings highlight that practitioners can benefit from paying attention to a variety of performance properties and the confounding effect of model optimization when tuning and optimizing their DNN models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Software Engineering and Methodology (TOSEM)

自引率

0.00%

发文量