{"title":"Proximal recursive generalized hyper-gradient descent method","authors":"Hao Zhang, Shuxia Lu","doi":"10.1016/j.asoc.2025.113073","DOIUrl":null,"url":null,"abstract":"<div><div>This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. How to determine the step size for solving composite optimization problems can be a challenge. To address this gap, we propose a recursive gradient descent algorithm using generalized hyper-gradient descent, named ProxSarah-GHD, which utilizes variance reduction techniques and provides update rules for adaptive step sizes. To improve its generalization in proximal gradient descent, a generalized variant of hyper-gradient descent, named <strong>G</strong>eneralized <strong>H</strong>yper-gradient <strong>D</strong>escent (GHD), is proposed in this paper. We prove that ProxSarah-GHD attains a linear convergence rate. Moreover, we provide the oracle complexity of ProxSarah-GHD as <span><math><mrow><mi>O</mi><mfenced><mrow><msup><mrow><mi>ϵ</mi></mrow><mrow><mo>−</mo><mn>3</mn></mrow></msup></mrow></mfenced></mrow></math></span> and <span><math><mrow><mi>O</mi><mfenced><mrow><msqrt><mrow><mi>n</mi></mrow></msqrt><msup><mrow><mi>ϵ</mi></mrow><mrow><mo>−</mo><mn>2</mn></mrow></msup><mo>+</mo><mi>n</mi></mrow></mfenced></mrow></math></span> in the online setting and finite-sum setting, respectively. In addition, to avoid the trouble of manually adjusting the batch size, we develop a novel <strong>E</strong>xponentially <strong>I</strong>ncreasing <strong>M</strong>ini-batch scheme for ProxSarah-GHD, named ProxSarah-GHD-EIM. The theoretical analysis that shows ProxSarah-GHD-EIM achieves a linear convergence rate is also provided, and shows that its total complexity is <span><math><mrow><mi>O</mi><mfenced><mrow><msup><mrow><mi>ϵ</mi></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup><mo>+</mo><msup><mrow><mi>ϵ</mi></mrow><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow></mfenced></mrow></math></span> and <span><math><mrow><mi>O</mi><mfenced><mrow><mi>n</mi><mo>+</mo><msup><mrow><mi>ϵ</mi></mrow><mrow><mo>−</mo><mn>4</mn></mrow></msup><mo>+</mo><msup><mrow><mi>ϵ</mi></mrow><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow></mfenced></mrow></math></span> in the online setting and finite-sum setting, respectively. Numerical experiments on standard datasets verify the superiority of the ProxSarah-GHD over other methods. We further analyze the sensitivity of the ProxSarah-GHD-EIM to its hyperparameters, conducting experiments on standard datasets.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"175 ","pages":"Article 113073"},"PeriodicalIF":7.2000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625003849","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper focuses on the non-convex, non-smooth composite optimization problem. It consists of a non-convex loss function and a non-smooth regularizer function that admits a proximal mapping. However, the method is still limited in handling objective functions that involve non-smooth regularizer. How to determine the step size for solving composite optimization problems can be a challenge. To address this gap, we propose a recursive gradient descent algorithm using generalized hyper-gradient descent, named ProxSarah-GHD, which utilizes variance reduction techniques and provides update rules for adaptive step sizes. To improve its generalization in proximal gradient descent, a generalized variant of hyper-gradient descent, named Generalized Hyper-gradient Descent (GHD), is proposed in this paper. We prove that ProxSarah-GHD attains a linear convergence rate. Moreover, we provide the oracle complexity of ProxSarah-GHD as and in the online setting and finite-sum setting, respectively. In addition, to avoid the trouble of manually adjusting the batch size, we develop a novel Exponentially Increasing Mini-batch scheme for ProxSarah-GHD, named ProxSarah-GHD-EIM. The theoretical analysis that shows ProxSarah-GHD-EIM achieves a linear convergence rate is also provided, and shows that its total complexity is and in the online setting and finite-sum setting, respectively. Numerical experiments on standard datasets verify the superiority of the ProxSarah-GHD over other methods. We further analyze the sensitivity of the ProxSarah-GHD-EIM to its hyperparameters, conducting experiments on standard datasets.
期刊介绍:
Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities.
Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.