深度神经网络中不同优化器性能的实证研究

2022 International Conference on Machine Vision and Image Processing (MVIP) Pub Date : 2022-02-23 DOI:10.1109/MVIP53647.2022.9738743

A. Zohrevand, Z. Imani

{"title":"深度神经网络中不同优化器性能的实证研究","authors":"A. Zohrevand, Z. Imani","doi":"10.1109/MVIP53647.2022.9738743","DOIUrl":null,"url":null,"abstract":"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks\",\"authors\":\"A. Zohrevand, Z. Imani\",\"doi\":\"10.1109/MVIP53647.2022.9738743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.\",\"PeriodicalId\":184716,\"journal\":{\"name\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MVIP53647.2022.9738743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Machine Vision and Image Processing (MVIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MVIP53647.2022.9738743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

近年来，随机梯度下降法(SGD)被广泛用于传统神经网络(CNN)模型的优化。虽然许多研究者采用CNN模型对任务进行分类，但据我们所知，针对CNN开发的各种优化器在训练CNN中并没有得到深入的研究和分析。在本文中，我们尝试研究各种优化器对CNN性能的影响。进行了两组实验。首先，对于CIFAR10, MNIST和Fashion MNIST数据集上的记录分类，一个名为VGG11的著名CNN由四种不同的优化器从头开始训练，包括SGD, Adam, Adadelta和AdaGrad。其次，通过同样的四个优化器，一个名为AlexNet的流行CNN架构被微调以对波斯语手写单词进行分类。在这两个实验中，结果表明Adam和AdaGrad在训练成本和识别准确率方面与其他两个优化器具有相对相似的行为和更高的性能。实验研究了不同初始学习率值对Adam优化器性能的影响。结果表明，数值越小收敛速度越快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks

In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 International Conference on Machine Vision and Image Processing (MVIP)

自引率

0.00%

发文量