Online Learning Under a Separable Stochastic Approximation Framework

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2024-11-11 DOI:10.1109/TPAMI.2024.3495783

Min Gan;Xiang-xiang Su;Guang-yong Chen;Jing Chen;C. L. Philip Chen

{"title":"Online Learning Under a Separable Stochastic Approximation Framework","authors":"Min Gan;Xiang-xiang Su;Guang-yong Chen;Jing Chen;C. L. Philip Chen","doi":"10.1109/TPAMI.2024.3495783","DOIUrl":null,"url":null,"abstract":"We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than others. This paper focuses on models where some parameters exhibit linear characteristics, which are common in machine learning applications. In our proposed algorithm, the linear parameters are updated using the recursive least squares (RLS) algorithm, akin to a stochastic Newton method. Subsequently, based on these updated linear parameters, the nonlinear parameters are adjusted using the stochastic gradient method (SGD). This dual-update mechanism can be viewed as a stochastic approximation variant of block coordinate gradient descent, where one subset of parameters is optimized using a second-order method while the other is handled with a first-order approach. We establish the global convergence of our online algorithm for non-convex cases in terms of the expected violation of first-order optimality conditions. Numerical experiments demonstrate that our method achieves significantly faster initial convergence and produces more robust performance compared to other popular learning algorithms. Additionally, our algorithm exhibits reduced sensitivity to learning rates and outperforms the recently proposed \n<monospace>slimTrain</monospace>\n algorithm (Newman et al. 2022). For validation, the code has been made available on GitHub.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"47 2","pages":"1317-1330"},"PeriodicalIF":0.0000,"publicationDate":"2024-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10750307/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We propose an online learning algorithm tailored for a class of machine learning models within a separable stochastic approximation framework. The central idea of our approach is to exploit the inherent separability in many models, recognizing that certain parameters are easier to optimize than others. This paper focuses on models where some parameters exhibit linear characteristics, which are common in machine learning applications. In our proposed algorithm, the linear parameters are updated using the recursive least squares (RLS) algorithm, akin to a stochastic Newton method. Subsequently, based on these updated linear parameters, the nonlinear parameters are adjusted using the stochastic gradient method (SGD). This dual-update mechanism can be viewed as a stochastic approximation variant of block coordinate gradient descent, where one subset of parameters is optimized using a second-order method while the other is handled with a first-order approach. We establish the global convergence of our online algorithm for non-convex cases in terms of the expected violation of first-order optimality conditions. Numerical experiments demonstrate that our method achieves significantly faster initial convergence and produces more robust performance compared to other popular learning algorithms. Additionally, our algorithm exhibits reduced sensitivity to learning rates and outperforms the recently proposed slimTrain algorithm (Newman et al. 2022). For validation, the code has been made available on GitHub.

查看原文本刊更多论文

可分离随机逼近框架下的在线学习

我们提出了一种在线学习算法，为可分离随机近似框架内的一类机器学习模型量身定制。我们方法的中心思想是利用许多模型的固有可分离性，认识到某些参数比其他参数更容易优化。本文关注的是一些参数表现出线性特征的模型，这在机器学习应用中很常见。在我们提出的算法中，线性参数使用递归最小二乘（RLS）算法更新，类似于随机牛顿方法。然后，基于这些更新的线性参数，使用随机梯度法（SGD）调整非线性参数。这种双更新机制可以看作是块坐标梯度下降的随机逼近变体，其中一个参数子集使用二阶方法进行优化，而另一个子集使用一阶方法进行处理。根据一阶最优性条件的期望违背，建立了非凸情况下在线算法的全局收敛性。数值实验表明，与其他流行的学习算法相比，我们的方法具有更快的初始收敛速度和更强的鲁棒性。此外，我们的算法对学习率的敏感性降低，优于最近提出的slimTrain算法（Newman et al. 2022）。为了验证，代码已经在GitHub上提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量