{"title":"深度神经网络中不同优化器性能的实证研究","authors":"A. Zohrevand, Z. Imani","doi":"10.1109/MVIP53647.2022.9738743","DOIUrl":null,"url":null,"abstract":"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.","PeriodicalId":184716,"journal":{"name":"2022 International Conference on Machine Vision and Image Processing (MVIP)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks\",\"authors\":\"A. Zohrevand, Z. Imani\",\"doi\":\"10.1109/MVIP53647.2022.9738743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.\",\"PeriodicalId\":184716,\"journal\":{\"name\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 International Conference on Machine Vision and Image Processing (MVIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MVIP53647.2022.9738743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Conference on Machine Vision and Image Processing (MVIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MVIP53647.2022.9738743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Empirical Study of the Performance of Different Optimizers in the Deep Neural Networks
In recent years, the Stochastic Gradient Descent (SGD) has been commonly used as an optimizer in the Conventional Neural Network (CNN) models. While many researchers have adopted CNN models to classify tasks, to the best of our knowledge, different optimizers developed for CNN have not been thoroughly studied and analyzed in the training CNNs. In this paper, attempts have been made to investigate the effects of the various optimizers on the performance of CNN. Two sets of experiments are conducted. First, for the classification of the records on the CIFAR10, MNIST, and Fashion MNIST datasets, a well-known CNN called VGG11 is trained from scratch by four different kinds of optimizers including SGD, Adam, Adadelta, and AdaGrad. Second, by the same four optimizers, a popular CNN architecture called AlexNet is fine-tuned to classify the Persian handwritten words. In both experiments, the results showed that Adam and AdaGrad have a relatively similar behavior and higher performance in comparison to the other two optimizers in terms of training cost and recognition accuracy. Also, the effect of different values of the initial learning rate on the performance of the Adam optimizer is investigated experimentally. The result revealed that lower values lead to converges more rapidly.