恒步长SGD型算法的平稳性

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2021-11-11 DOI:10.1145/3508039

Zaiwei Chen, Shancong Mou, S. T. Maguluri

{"title":"恒步长SGD型算法的平稳性","authors":"Zaiwei Chen, Shancong Mou, S. T. Maguluri","doi":"10.1145/3508039","DOIUrl":null,"url":null,"abstract":"Stochastic approximation (SA) and stochastic gradient descent (SGD) algorithms are work-horses for modern machine learning algorithms. Their constant stepsize variants are preferred in practice due to fast convergence behavior. However, constant stepsize SA algorithms do not converge to the optimal solution, but instead have a stationary distribution, which in general cannot be analytically characterized. In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero. Specifically, we consider the following three settings: (1) SGD algorithm with a smooth and strongly convex objective, (2) linear SA algorithm involving a Hurwitz matrix, and (3) nonlinear SA algorithm involving a contractive operator. When the iterate is scaled by 1/α, where α is the constant stepsize, we show that the limiting scaled stationary distribution is a solution of an implicit equation. Under a uniqueness assumption (which can be removed in certain settings) on this equation, we further characterize the limiting distribution as a Gaussian distribution whose covariance matrix is the unique solution of a suitable Lyapunov equation. For SA algorithms beyond these cases, our numerical experiments suggest that unlike central limit theorem type results: (1) the scaling factor need not be 1/α, and (2) the limiting distribution need not be Gaussian. Based on the numerical study, we come up with a heuristic formula to determine the right scaling factor, and make insightful connection to the Euler-Maruyama discretization scheme for approximating stochastic differential equations.","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Stationary Behavior of Constant Stepsize SGD Type Algorithms\",\"authors\":\"Zaiwei Chen, Shancong Mou, S. T. Maguluri\",\"doi\":\"10.1145/3508039\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stochastic approximation (SA) and stochastic gradient descent (SGD) algorithms are work-horses for modern machine learning algorithms. Their constant stepsize variants are preferred in practice due to fast convergence behavior. However, constant stepsize SA algorithms do not converge to the optimal solution, but instead have a stationary distribution, which in general cannot be analytically characterized. In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero. Specifically, we consider the following three settings: (1) SGD algorithm with a smooth and strongly convex objective, (2) linear SA algorithm involving a Hurwitz matrix, and (3) nonlinear SA algorithm involving a contractive operator. When the iterate is scaled by 1/α, where α is the constant stepsize, we show that the limiting scaled stationary distribution is a solution of an implicit equation. Under a uniqueness assumption (which can be removed in certain settings) on this equation, we further characterize the limiting distribution as a Gaussian distribution whose covariance matrix is the unique solution of a suitable Lyapunov equation. For SA algorithms beyond these cases, our numerical experiments suggest that unlike central limit theorem type results: (1) the scaling factor need not be 1/α, and (2) the limiting distribution need not be Gaussian. Based on the numerical study, we come up with a heuristic formula to determine the right scaling factor, and make insightful connection to the Euler-Maruyama discretization scheme for approximating stochastic differential equations.\",\"PeriodicalId\":426760,\"journal\":{\"name\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3508039\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3508039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

随机逼近(SA)和随机梯度下降(SGD)算法是现代机器学习算法的支柱。由于其快速收敛的特性，在实践中更倾向于采用恒定步长变量。然而，恒定步长SA算法不会收敛到最优解，而是具有平稳分布，通常无法解析表征。在本文中，我们研究了适当比例的平稳分布在常步长趋近于零的极限下的渐近行为。具体来说，我们考虑了以下三种设置:(1)具有光滑和强凸目标的SGD算法，(2)涉及Hurwitz矩阵的线性SA算法，(3)涉及压缩算子的非线性SA算法。当迭代被1/α缩放时，其中α是常数步长，我们证明了极限缩放平稳分布是一个隐式方程的解。在该方程的唯一性假设下(在某些情况下可以取消)，我们进一步将极限分布表征为高斯分布，其协方差矩阵是一个合适的Lyapunov方程的唯一解。对于超出这些情况的SA算法，我们的数值实验表明，与中心极限定理类型的结果不同:(1)缩放因子不必是1/α，(2)极限分布不必是高斯分布。在数值研究的基础上，我们提出了一个确定合适比例因子的启发式公式，并将其与近似随机微分方程的Euler-Maruyama离散化格式进行了深刻的联系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Stationary Behavior of Constant Stepsize SGD Type Algorithms

Stochastic approximation (SA) and stochastic gradient descent (SGD) algorithms are work-horses for modern machine learning algorithms. Their constant stepsize variants are preferred in practice due to fast convergence behavior. However, constant stepsize SA algorithms do not converge to the optimal solution, but instead have a stationary distribution, which in general cannot be analytically characterized. In this work, we study the asymptotic behavior of the appropriately scaled stationary distribution, in the limit when the constant stepsize goes to zero. Specifically, we consider the following three settings: (1) SGD algorithm with a smooth and strongly convex objective, (2) linear SA algorithm involving a Hurwitz matrix, and (3) nonlinear SA algorithm involving a contractive operator. When the iterate is scaled by 1/α, where α is the constant stepsize, we show that the limiting scaled stationary distribution is a solution of an implicit equation. Under a uniqueness assumption (which can be removed in certain settings) on this equation, we further characterize the limiting distribution as a Gaussian distribution whose covariance matrix is the unique solution of a suitable Lyapunov equation. For SA algorithms beyond these cases, our numerical experiments suggest that unlike central limit theorem type results: (1) the scaling factor need not be 1/α, and (2) the limiting distribution need not be Gaussian. Based on the numerical study, we come up with a heuristic formula to determine the right scaling factor, and make insightful connection to the Euler-Maruyama discretization scheme for approximating stochastic differential equations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量