Variance Reduction Techniques for Stochastic Proximal Point Algorithms

IF 1.5 3区数学 Q2 MATHEMATICS, APPLIED

Journal of Optimization Theory and Applications Pub Date : 2024-08-02 DOI:10.1007/s10957-024-02502-6

Cheik Traoré, Vassilis Apidopoulos, Saverio Salzo, Silvia Villa

{"title":"Variance Reduction Techniques for Stochastic Proximal Point Algorithms","authors":"Cheik Traoré, Vassilis Apidopoulos, Saverio Salzo, Silvia Villa","doi":"10.1007/s10957-024-02502-6","DOIUrl":null,"url":null,"abstract":"In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the step size. However, their variance-reduced versions are not as well studied as the gradient ones. In this work, we propose the first unified study of variance reduction techniques for stochastic proximal point algorithms. We introduce a generic stochastic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants. For this algorithm, in the smooth setting, we provide several convergence rates for the iterates and the objective function values, which are faster than those of the vanilla stochastic proximal point algorithm. More specifically, for convex functions, we prove a sublinear convergence rate of O(1/k). In addition, under the Polyak-łojasiewicz condition, we obtain linear convergence rates. Finally, our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts in terms of the stability with respect to the choice of the step size in most cases, especially for difficult problems.","PeriodicalId":50100,"journal":{"name":"Journal of Optimization Theory and Applications","volume":"5 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Optimization Theory and Applications","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1007/s10957-024-02502-6","RegionNum":3,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the step size. However, their variance-reduced versions are not as well studied as the gradient ones. In this work, we propose the first unified study of variance reduction techniques for stochastic proximal point algorithms. We introduce a generic stochastic proximal-based algorithm that can be specified to give the proximal version of SVRG, SAGA, and some of their variants. For this algorithm, in the smooth setting, we provide several convergence rates for the iterates and the objective function values, which are faster than those of the vanilla stochastic proximal point algorithm. More specifically, for convex functions, we prove a sublinear convergence rate of O(1/k). In addition, under the Polyak-łojasiewicz condition, we obtain linear convergence rates. Finally, our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts in terms of the stability with respect to the choice of the step size in most cases, especially for difficult problems.

Abstract Image

查看原文本刊更多论文

随机近点算法的方差缩小技术

在有限和最小化的背景下，方差缩小技术被广泛用于改善最先进的随机梯度方法的性能。它们的实际影响和理论特性都是显而易见的。随机近点算法作为随机梯度算法的一种替代方法被研究，因为它们在步长的选择上更加稳定。然而，对其方差缩小版本的研究却不如梯度算法。在这项工作中，我们首次提出了随机近点算法方差缩小技术的统一研究。我们介绍了一种基于随机近点的通用算法，它可以指定为 SVRG、SAGA 及其一些变体的近点版本。对于这种算法，在平滑设置中，我们提供了迭代和目标函数值的几种收敛速率，它们比普通随机近似点算法更快。更具体地说，对于凸函数，我们证明了 O(1/k)的亚线性收敛率。此外，在 Polyak-łojasiewicz 条件下，我们获得了线性收敛速率。最后，我们的数值实验证明了近似方差缩小方法在大多数情况下，尤其是在困难问题上，与梯度方法相比，在步长选择的稳定性方面具有优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Optimization Theory and Applications 数学-应用数学

CiteScore

3.30

自引率

5.30%

发文量

149

审稿时长

9.9 months

期刊介绍： The Journal of Optimization Theory and Applications is devoted to the publication of carefully selected regular papers, invited papers, survey papers, technical notes, book notices, and forums that cover mathematical optimization techniques and their applications to science and engineering. Typical theoretical areas include linear, nonlinear, mathematical, and dynamic programming. Among the areas of application covered are mathematical economics, mathematical physics and biology, and aerospace, chemical, civil, electrical, and mechanical engineering.