深度神经网络中不同优化器性能的实证研究

A. Zohrevand, Z. Imani
{"title":"深度神经网络中不同优化器性能的实证研究","authors":"A. Zohrevand, Z. Imani","doi":"10.1109/MVIP53647.2022.9738743","DOIUrl":null,"url":null,"abstract":"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks\",\"authors\":\"A. Zohrevand, Z. Imani\",\"doi\":\"10.1109/MVIP53647.2022.9738743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.\",\"PeriodicalId\":184716,\"journal\":{\"name\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MVIP53647.2022.9738743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Machine Vision and Image Processing (MVIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MVIP53647.2022.9738743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

近年来,随机梯度下降法(SGD)被广泛用于传统神经网络(CNN)模型的优化。虽然许多研究者采用CNN模型对任务进行分类,但据我们所知,针对CNN开发的各种优化器在训练CNN中并没有得到深入的研究和分析。在本文中,我们尝试研究各种优化器对CNN性能的影响。进行了两组实验。首先,对于CIFAR10, MNIST和Fashion MNIST数据集上的记录分类,一个名为VGG11的著名CNN由四种不同的优化器从头开始训练,包括SGD, Adam, Adadelta和AdaGrad。其次,通过同样的四个优化器,一个名为AlexNet的流行CNN架构被微调以对波斯语手写单词进行分类。在这两个实验中,结果表明Adam和AdaGrad在训练成本和识别准确率方面与其他两个优化器具有相对相似的行为和更高的性能。实验研究了不同初始学习率值对Adam优化器性能的影响。结果表明,数值越小收敛速度越快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks
In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信