Using Knowledge Transfer for Neural Network Architecture Optimization with Improved Training Efficiency

2022 26th International Conference on System Theory, Control and Computing (ICSTCC) Pub Date : 2022-10-19 DOI:10.1109/ICSTCC55426.2022.9931898

M. Gavrilescu, Florin Leon

{"title":"Using Knowledge Transfer for Neural Network Architecture Optimization with Improved Training Efficiency","authors":"M. Gavrilescu, Florin Leon","doi":"10.1109/ICSTCC55426.2022.9931898","DOIUrl":null,"url":null,"abstract":"Neural networks have demonstrated their potential for solving a wide range of real-world problems. While capable of learning correlations and relationships among complex datasets, neural network training and validation are often tedious and time-consuming, especially for large, high-dimensional datasets, given that sufficient models should be run through the validation pipeline so as to adequately cover the hyperparameter space. In order to address this problem, we propose a weight adjustment method for improving training times when searching for the optimal neural network architecture by incorporating a knowledge transfer step into the corresponding pipeline. This involves learning weight features from a trained model and using them to improve the training times of a subsequent one. Considering a sequence of candidate neural networks with different hyperparameter values, each model, once trained, serves as a source of knowledge for the next one, resulting in improved overall training times. We test our approach on dataset of various sizes, where we obtain reduced total training times, especially when small learning rates are used for finely-tuned convergence.","PeriodicalId":220845,"journal":{"name":"2022 26th International Conference on System Theory, Control and Computing (ICSTCC)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 26th International Conference on System Theory, Control and Computing (ICSTCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSTCC55426.2022.9931898","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Neural networks have demonstrated their potential for solving a wide range of real-world problems. While capable of learning correlations and relationships among complex datasets, neural network training and validation are often tedious and time-consuming, especially for large, high-dimensional datasets, given that sufficient models should be run through the validation pipeline so as to adequately cover the hyperparameter space. In order to address this problem, we propose a weight adjustment method for improving training times when searching for the optimal neural network architecture by incorporating a knowledge transfer step into the corresponding pipeline. This involves learning weight features from a trained model and using them to improve the training times of a subsequent one. Considering a sequence of candidate neural networks with different hyperparameter values, each model, once trained, serves as a source of knowledge for the next one, resulting in improved overall training times. We test our approach on dataset of various sizes, where we obtain reduced total training times, especially when small learning rates are used for finely-tuned convergence.

查看原文本刊更多论文

利用知识转移进行神经网络结构优化，提高训练效率

神经网络已经证明了它们在解决广泛的现实问题方面的潜力。虽然能够学习复杂数据集之间的相关性和关系，但神经网络的训练和验证通常是繁琐和耗时的，特别是对于大型高维数据集，因为需要在验证管道中运行足够的模型以充分覆盖超参数空间。为了解决这个问题，我们提出了一种权重调整方法，通过在相应的管道中加入知识转移步骤来提高搜索最优神经网络架构时的训练时间。这包括从训练好的模型中学习权重特征，并使用它们来提高后续模型的训练时间。考虑具有不同超参数值的候选神经网络序列，每个模型一旦训练完成，就可以作为下一个模型的知识来源，从而提高了整体训练时间。我们在不同大小的数据集上测试了我们的方法，在那里我们获得了减少的总训练时间，特别是当使用小学习率进行微调收敛时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 26th International Conference on System Theory, Control and Computing (ICSTCC)

自引率

0.00%

发文量