{"title":"Comparison of the stochastic gradient descent based optimization techniques","authors":"Ersan Yazan, M. F. Talu","doi":"10.1109/IDAP.2017.8090299","DOIUrl":null,"url":null,"abstract":"The stochastic gradual descent method (SGD) is a popular optimization technique based on updating each θk parameter in the ∂J(θ)/∂θk direction to minimize / maximize the J(θ) cost function. This technique is frequently used in current artificial learning methods such as convolutional learning and automatic encoders. In this study, five different approaches (Momentum, Adagrad, Adadelta, Rmsprop ve Adam) based on SDA used in updating the θ parameters were investigated. By selecting specific test functions, the advantages and disadvantages of each approach are compared with each other in terms of the number of oscillations, the parameter update rate and the minimum cost reached. The comparison results are shown graphically.","PeriodicalId":111721,"journal":{"name":"2017 International Artificial Intelligence and Data Processing Symposium (IDAP)","volume":"208 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Artificial Intelligence and Data Processing Symposium (IDAP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IDAP.2017.8090299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 54
Abstract
The stochastic gradual descent method (SGD) is a popular optimization technique based on updating each θk parameter in the ∂J(θ)/∂θk direction to minimize / maximize the J(θ) cost function. This technique is frequently used in current artificial learning methods such as convolutional learning and automatic encoders. In this study, five different approaches (Momentum, Adagrad, Adadelta, Rmsprop ve Adam) based on SDA used in updating the θ parameters were investigated. By selecting specific test functions, the advantages and disadvantages of each approach are compared with each other in terms of the number of oscillations, the parameter update rate and the minimum cost reached. The comparison results are shown graphically.