Research on three-step accelerated gradient algorithm in deep learning

IF 0.7 Q3 STATISTICS & PROBABILITY

Statistical Theory and Related Fields Pub Date : 2020-11-23 DOI:10.1080/24754269.2020.1846414

Yongqiang Lian, Yincai Tang, Shirong Zhou

{"title":"Research on three-step accelerated gradient algorithm in deep learning","authors":"Yongqiang Lian, Yincai Tang, Shirong Zhou","doi":"10.1080/24754269.2020.1846414","DOIUrl":null,"url":null,"abstract":"Gradient descent (GD) algorithm is the widely used optimisation method in training machine learning and deep learning models. In this paper, based on GD, Polyak's momentum (PM), and Nesterov accelerated gradient (NAG), we give the convergence of the algorithms from an initial value to the optimal value of an objective function in simple quadratic form. Based on the convergence property of the quadratic function, two sister sequences of NAG's iteration and parallel tangent methods in neural networks, the three-step accelerated gradient (TAG) algorithm is proposed, which has three sequences other than two sister sequences. To illustrate the performance of this algorithm, we compare the proposed algorithm with the three other algorithms in quadratic function, high-dimensional quadratic functions, and nonquadratic function. Then we consider to combine the TAG algorithm to the backpropagation algorithm and the stochastic gradient descent algorithm in deep learning. For conveniently facilitate the proposed algorithms, we rewite the R package ‘neuralnet’ and extend it to ‘supneuralnet’. All kinds of deep learning algorithms in this paper are included in ‘supneuralnet’ package. Finally, we show our algorithms are superior to other algorithms in four case studies.","PeriodicalId":22070,"journal":{"name":"Statistical Theory and Related Fields","volume":"6 1","pages":"40 - 57"},"PeriodicalIF":0.7000,"publicationDate":"2020-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24754269.2020.1846414","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical Theory and Related Fields","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1080/24754269.2020.1846414","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

Gradient descent (GD) algorithm is the widely used optimisation method in training machine learning and deep learning models. In this paper, based on GD, Polyak's momentum (PM), and Nesterov accelerated gradient (NAG), we give the convergence of the algorithms from an initial value to the optimal value of an objective function in simple quadratic form. Based on the convergence property of the quadratic function, two sister sequences of NAG's iteration and parallel tangent methods in neural networks, the three-step accelerated gradient (TAG) algorithm is proposed, which has three sequences other than two sister sequences. To illustrate the performance of this algorithm, we compare the proposed algorithm with the three other algorithms in quadratic function, high-dimensional quadratic functions, and nonquadratic function. Then we consider to combine the TAG algorithm to the backpropagation algorithm and the stochastic gradient descent algorithm in deep learning. For conveniently facilitate the proposed algorithms, we rewite the R package ‘neuralnet’ and extend it to ‘supneuralnet’. All kinds of deep learning algorithms in this paper are included in ‘supneuralnet’ package. Finally, we show our algorithms are superior to other algorithms in four case studies.

查看原文本刊更多论文

深度学习中三步加速梯度算法的研究

梯度下降算法(GD)是一种广泛应用于机器学习和深度学习模型训练的优化方法。本文基于GD、Polyak动量(PM)和Nesterov加速梯度(NAG)，给出了算法从目标函数的初始值到最优值的简单二次型收敛性。基于神经网络中二次函数、NAG迭代的两个姊妹序列和并行切线方法的收敛性，提出了三步加速梯度(TAG)算法，该算法具有除两个姊妹序列外的三个序列。为了说明该算法的性能，我们将该算法与其他三种算法在二次函数、高维二次函数和非二次函数方面进行了比较。然后，我们考虑将TAG算法与深度学习中的反向传播算法和随机梯度下降算法相结合。为了方便所提出的算法，我们重写了R包' neuralnet '并将其扩展为' supneuralnet '。本文中的各种深度学习算法都包含在“supneuralnet”包中。最后，我们在四个案例研究中证明了我们的算法优于其他算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Statistical Theory and Related Fields Mathematics-Analysis

CiteScore

0.90

自引率

20.00%

发文量