Revisiting Stochastic Multi-Level Compositional Optimization.

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-07-01 DOI:10.1109/TPAMI.2025.3552197

Wei Jiang, Sifan Yang, Yibo Wang, Tianbao Yang, Lijun Zhang

{"title":"Revisiting Stochastic Multi-Level Compositional Optimization.","authors":"Wei Jiang, Sifan Yang, Yibo Wang, Tianbao Yang, Lijun Zhang","doi":"10.1109/TPAMI.2025.3552197","DOIUrl":null,"url":null,"abstract":"<p><p>This paper explores stochastic multi-level compositional optimization, where the objective function is a composition of multiple smooth functions. Traditional methods for solving this problem suffer from either sub-optimal sample complexities or require huge batch sizes. To address these limitations, we introduce the Stochastic Multi-level Variance Reduction (SMVR) method. In the expectation case, our SMVR method attains the optimal sample complexity of $\\mathcal {O}(1/\\epsilon ^{3})$O(1/ε3) to find an $\\epsilon$ε-stationary point for non-convex objectives. When the function satisfies convexity or the Polyak-Łojasiewicz (PL) condition, we propose a stage-wise SMVR variant. This variant improves the sample complexity to $\\mathcal {O}(1/\\epsilon ^{2})$O(1/ε2) for convex functions and $\\mathcal {O}(1/(\\mu \\epsilon ))$O(1/(με)) for functions meeting the $\\mu$μ-PL condition or $\\mu$μ-strong convexity. These complexities match the lower bounds not only in terms of $\\epsilon$ε but also in terms of $\\mu$μ (for PL or strongly convex functions), without relying on large batch sizes in each iteration. Furthermore, in the finite-sum case, we develop the SMVR-FS algorithm, which can achieve a complexity of $\\mathcal {O}(\\sqrt{n}/\\epsilon ^{2})$O(n/ε2) for non-convex objectives, $\\mathcal {O}(\\sqrt{n}/\\epsilon \\log (1/\\epsilon ))$O(n/εlog(1/ε)) for convex functions and $\\mathcal {O}(\\sqrt{n}/\\mu \\log (1/\\epsilon ))$O(n/μlog(1/ε)) for objectives satisfying the $\\mu$μ-PL condition, where $n$n denotes the number of functions in each level. To make use of adaptive learning rates, we propose the Adaptive SMVR method, which maintains the same complexities while demonstrating faster convergence in practice.</p>","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":"5613-5624"},"PeriodicalIF":0.0000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3552197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper explores stochastic multi-level compositional optimization, where the objective function is a composition of multiple smooth functions. Traditional methods for solving this problem suffer from either sub-optimal sample complexities or require huge batch sizes. To address these limitations, we introduce the Stochastic Multi-level Variance Reduction (SMVR) method. In the expectation case, our SMVR method attains the optimal sample complexity of $\mathcal {O}(1/\epsilon ^{3})$O(1/ε3) to find an $\epsilon$ε-stationary point for non-convex objectives. When the function satisfies convexity or the Polyak-Łojasiewicz (PL) condition, we propose a stage-wise SMVR variant. This variant improves the sample complexity to $\mathcal {O}(1/\epsilon ^{2})$O(1/ε2) for convex functions and $\mathcal {O}(1/(\mu \epsilon ))$O(1/(με)) for functions meeting the $\mu$μ-PL condition or $\mu$μ-strong convexity. These complexities match the lower bounds not only in terms of $\epsilon$ε but also in terms of $\mu$μ (for PL or strongly convex functions), without relying on large batch sizes in each iteration. Furthermore, in the finite-sum case, we develop the SMVR-FS algorithm, which can achieve a complexity of $\mathcal {O}(\sqrt{n}/\epsilon ^{2})$O(n/ε2) for non-convex objectives, $\mathcal {O}(\sqrt{n}/\epsilon \log (1/\epsilon ))$O(n/εlog(1/ε)) for convex functions and $\mathcal {O}(\sqrt{n}/\mu \log (1/\epsilon ))$O(n/μlog(1/ε)) for objectives satisfying the $\mu$μ-PL condition, where $n$n denotes the number of functions in each level. To make use of adaptive learning rates, we propose the Adaptive SMVR method, which maintains the same complexities while demonstrating faster convergence in practice.

查看原文本刊更多论文

再论随机多级成分优化。

本文研究了随机多级组合优化问题，其中目标函数是多个光滑函数的组合。解决这一问题的传统方法要么是非最优的样本复杂性，要么需要大量的批处理。为了解决这些限制，我们引入了随机多级方差缩减（SMVR）方法。在期望情况下，我们的SMVR方法达到了寻找非凸目标的非平稳点的最优样本复杂度。当函数满足凸性或Polyak-Łojasiewicz （PL）条件时，我们提出了一种分阶段的SMVR变体。这种变体提高了凸函数和满足-PL条件或-强凸性的函数的样本复杂度。这些复杂性不仅在（PL或强凸函数）方面匹配下限，而且在（PL或强凸函数）方面匹配下限，而不依赖于每次迭代中的大批量大小。此外，在有限和情况下，我们开发了SMVR-FS算法，该算法对于非凸目标、凸函数和满足-PL条件的目标都可以实现复杂度为，其中表示每一层的函数数。为了利用自适应学习率，我们提出了自适应SMVR方法，该方法在保持相同复杂性的同时在实践中表现出更快的收敛性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量