Neural architecture search via standard machine learning methodologies

IF 1.3 4区工程技术 Q3 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS

Mathematics in Engineering Pub Date : 2022-01-01 DOI:10.3934/mine.2023012

Giorgia Franchini, V. Ruggiero, F. Porta, L. Zanni

{"title":"Neural architecture search via standard machine learning methodologies","authors":"Giorgia Franchini, V. Ruggiero, F. Porta, L. Zanni","doi":"10.3934/mine.2023012","DOIUrl":null,"url":null,"abstract":"In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.","PeriodicalId":54213,"journal":{"name":"Mathematics in Engineering","volume":"1 1","pages":""},"PeriodicalIF":1.3000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics in Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mine.2023012","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 14

Abstract

In the context of deep learning, the more expensive computational phase is the full training of the learning methodology. Indeed, its effectiveness depends on the choice of proper values for the so-called hyperparameters, namely the parameters that are not trained during the learning process, and such a selection typically requires an extensive numerical investigation with the execution of a significant number of experimental trials. The aim of the paper is to investigate how to choose the hyperparameters related to both the architecture of a Convolutional Neural Network (CNN), such as the number of filters and the kernel size at each convolutional layer, and the optimisation algorithm employed to train the CNN itself, such as the steplength, the mini-batch size and the potential adoption of variance reduction techniques. The main contribution of the paper consists in introducing an automatic Machine Learning technique to set these hyperparameters in such a way that a measure of the CNN performance can be optimised. In particular, given a set of values for the hyperparameters, we propose a low-cost strategy to predict the performance of the corresponding CNN, based on its behavior after only few steps of the training process. To achieve this goal, we generate a dataset whose input samples are provided by a limited number of hyperparameter configurations together with the corresponding CNN measures of performance obtained with only few steps of the CNN training process, while the label of each input sample is the performance corresponding to a complete training of the CNN. Such dataset is used as training set for a Support Vector Machines for Regression and/or Random Forest techniques to predict the performance of the considered learning methodology, given its performance at the initial iterations of its learning process. Furthermore, by a probabilistic exploration of the hyperparameter space, we are able to find, at a quite low cost, the setting of a CNN hyperparameters which provides the optimal performance. The results of an extensive numerical experimentation, carried out on CNNs, together with the use of our performance predictor with NAS-Bench-101, highlight how the proposed methodology for the hyperparameter setting appears very promising.

查看原文本刊更多论文

通过标准机器学习方法进行神经结构搜索

在深度学习的背景下，更昂贵的计算阶段是学习方法的全面训练。事实上，它的有效性取决于对所谓的超参数(即在学习过程中未训练的参数)的适当值的选择，而这种选择通常需要进行大量的数值研究，并执行大量的实验试验。本文的目的是研究如何选择与卷积神经网络(CNN)架构相关的超参数，如每个卷积层的滤波器数量和核大小，以及用于训练CNN本身的优化算法，如步长、小批大小和方差缩减技术的潜在采用。本文的主要贡献在于引入了一种自动机器学习技术来设置这些超参数，从而可以优化CNN性能的度量。特别是，给定一组超参数的值，我们提出了一种低成本的策略来预测相应CNN的性能，基于它在训练过程中仅经过几步的行为。为了实现这一目标，我们生成了一个数据集，该数据集的输入样本是由有限数量的超参数配置提供的，以及CNN训练过程中仅几步就获得的相应的CNN性能度量，而每个输入样本的标签是CNN完整训练所对应的性能。这样的数据集被用作回归和/或随机森林技术的支持向量机的训练集，以预测所考虑的学习方法的性能，给定其在学习过程的初始迭代中的性能。此外，通过对超参数空间的概率探索，我们能够以相当低的成本找到提供最佳性能的CNN超参数的设置。在cnn上进行的广泛数值实验的结果，以及我们的性能预测器与NAS-Bench-101的使用，突出了所提出的超参数设置方法看起来非常有前途。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊