On the Improvement of the Barzilai–Borwein Step Size in Variance Reduction Methods

IF 1.7 2区数学 Q2 MATHEMATICS, APPLIED

Applied Mathematics and Optimization Pub Date : 2025-09-27 DOI:10.1007/s00245-025-10316-9

Hai Liu, Yan Liu, Tiande Guo, Congying Han

{"title":"On the Improvement of the Barzilai–Borwein Step Size in Variance Reduction Methods","authors":"Hai Liu, Yan Liu, Tiande Guo, Congying Han","doi":"10.1007/s00245-025-10316-9","DOIUrl":null,"url":null,"abstract":"<div>We propose several modifications of the Barzilai–Borwein (BB) step size in the variance reduction (VR) methods for finite-sum optimization problems. Our first approach relies on a scalar function, which we call the TaiL Function (TLF). The TLF maps the computed BB step size to some positive real number, which will be used as the step size instead. The computational overhead is almost negligible and the functional forms of TLFs in this work don’t involve any problem-dependent parameters. In the strongly convex setting, due to the undesirable appearance of the condition number \\(\\kappa \\) in the linear convergence rate, the IFO complexity of VR methods with BB step size has the form \\(\\mathcal {O}((n+\\kappa ^a)\\kappa \\log (1/\\epsilon ))\\), \\(a\\in \\mathbb {R}_{+}\\). With the utilization of the TLF, the aforementioned complexity is improved to \\(\\mathcal {O}((n+\\kappa ^{\\tilde{a}})\\log (1/\\epsilon ))\\), \\(\\tilde{a}\\in \\mathbb {R}_{+}, \\tilde{a}<a\\). In the non-convex setting, we improve \\(\\mathcal {O}(n+n\\epsilon ^{-1})\\) of SVRG-SBB to \\(\\mathcal {O}(n+n^{\\beta }\\epsilon ^{-1})\\), where \\(\\beta \\in \\mathbb {R}_{+}\\) can take any value in (2/3, 1). Specifically, the constant step size regime is recovered by taking the TLF as a constant function, whose function value relies on problem-dependent parameters. As a counterpart of the constant step size regime, we also propose a BB-based vibration technique to set step sizes for VR methods, leading to methods with novel one-parameter step sizes. These methods have the same complexities compared to their constant step size versions. Meanwhile, they are more robust w.r.t. the sole step size parameter empirically. Moreover, a novel analysis is proposed for SARAH-I-type methods in the strongly convex setting. Numerical tests corroborate the proposed methods.</div>","PeriodicalId":55566,"journal":{"name":"Applied Mathematics and Optimization","volume":"92 2","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Mathematics and Optimization","FirstCategoryId":"100","ListUrlMain":"https://link.springer.com/article/10.1007/s00245-025-10316-9","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

We propose several modifications of the Barzilai–Borwein (BB) step size in the variance reduction (VR) methods for finite-sum optimization problems. Our first approach relies on a scalar function, which we call the TaiL Function (TLF). The TLF maps the computed BB step size to some positive real number, which will be used as the step size instead. The computational overhead is almost negligible and the functional forms of TLFs in this work don’t involve any problem-dependent parameters. In the strongly convex setting, due to the undesirable appearance of the condition number \(\kappa \) in the linear convergence rate, the IFO complexity of VR methods with BB step size has the form \(\mathcal {O}((n+\kappa ^a)\kappa \log (1/\epsilon ))\), \(a\in \mathbb {R}_{+}\). With the utilization of the TLF, the aforementioned complexity is improved to \(\mathcal {O}((n+\kappa ^{\tilde{a}})\log (1/\epsilon ))\), \(\tilde{a}\in \mathbb {R}_{+}, \tilde{a}<a\). In the non-convex setting, we improve \(\mathcal {O}(n+n\epsilon ^{-1})\) of SVRG-SBB to \(\mathcal {O}(n+n^{\beta }\epsilon ^{-1})\), where \(\beta \in \mathbb {R}_{+}\) can take any value in (2/3, 1). Specifically, the constant step size regime is recovered by taking the TLF as a constant function, whose function value relies on problem-dependent parameters. As a counterpart of the constant step size regime, we also propose a BB-based vibration technique to set step sizes for VR methods, leading to methods with novel one-parameter step sizes. These methods have the same complexities compared to their constant step size versions. Meanwhile, they are more robust w.r.t. the sole step size parameter empirically. Moreover, a novel analysis is proposed for SARAH-I-type methods in the strongly convex setting. Numerical tests corroborate the proposed methods.

Abstract Image

查看原文本刊更多论文

方差缩减方法中Barzilai-Borwein步长的改进

针对有限和优化问题，我们提出了对方差缩减（VR）方法中Barzilai-Borwein （BB）步长的几种修正。我们的第一种方法依赖于一个标量函数，我们称之为TaiL function （TLF）。TLF将计算出的BB步长映射到一个正实数，该实数将用作步长。计算开销几乎可以忽略不计，并且本工作中tlf的函数形式不涉及任何与问题相关的参数。在强凸设置下，由于线性收敛速率中条件数\(\kappa \)的不良出现，步长为BB的VR方法的IFO复杂度为\(\mathcal {O}((n+\kappa ^a)\kappa \log (1/\epsilon ))\), \(a\in \mathbb {R}_{+}\)。随着TLF的使用，前面提到的复杂性提高到了\(\mathcal {O}((n+\kappa ^{\tilde{a}})\log (1/\epsilon ))\), \(\tilde{a}\in \mathbb {R}_{+}, \tilde{a}<a\)。在非凸设置下，我们将SVRG-SBB的\(\mathcal {O}(n+n\epsilon ^{-1})\)改进为\(\mathcal {O}(n+n^{\beta }\epsilon ^{-1})\)，其中\(\beta \in \mathbb {R}_{+}\)可以取（2/ 3,1）中的任意值。具体而言，将TLF作为一个常数函数，其函数值依赖于与问题相关的参数，从而恢复恒定步长范围。作为恒定步长机制的对应，我们还提出了一种基于bb的振动技术来设置VR方法的步长，从而导致具有新颖的单参数步长方法。与步长不变的版本相比，这些方法具有相同的复杂性。同时，对于单一步长参数，它们具有更强的鲁棒性。此外，本文还对强凸环境下的sarah - i型方法提出了一种新的分析方法。数值试验证实了所提出的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Mathematics and Optimization 数学-应用数学

CiteScore

3.30

自引率

5.60%

发文量

103

审稿时长

>12 weeks

期刊介绍： The Applied Mathematics and Optimization Journal covers a broad range of mathematical methods in particular those that bridge with optimization and have some connection with applications. Core topics include calculus of variations, partial differential equations, stochastic control, optimization of deterministic or stochastic systems in discrete or continuous time, homogenization, control theory, mean field games, dynamic games and optimal transport. Algorithmic, data analytic, machine learning and numerical methods which support the modeling and analysis of optimization problems are encouraged. Of great interest are papers which show some novel idea in either the theory or model which include some connection with potential applications in science and engineering.