应用于最小二乘的随机梯度下降的正则化效果

S. Steinerberger
{"title":"应用于最小二乘的随机梯度下降的正则化效果","authors":"S. Steinerberger","doi":"10.1553/etna_vol54s610","DOIUrl":null,"url":null,"abstract":"We study the behavior of stochastic gradient descent applied to $\\|Ax -b \\|_2^2 \\rightarrow \\min$ for invertible $A \\in \\mathbb{R}^{n \\times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \\mathbb{E} ~\\left\\| Ax_{k+1}-b\\right\\|^2_{2} \\leq \\left(1 + \\frac{c_{A}}{\\|A\\|_F^2}\\right) \\left\\|A x_k -b \\right\\|^2_{2} - \\frac{2}{\\|A\\|_F^2} \\left\\|A^T A (x_k - x)\\right\\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.","PeriodicalId":282695,"journal":{"name":"ETNA - Electronic Transactions on Numerical Analysis","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"On the regularization effect of stochastic gradient descent applied to least-squares\",\"authors\":\"S. Steinerberger\",\"doi\":\"10.1553/etna_vol54s610\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the behavior of stochastic gradient descent applied to $\\\\|Ax -b \\\\|_2^2 \\\\rightarrow \\\\min$ for invertible $A \\\\in \\\\mathbb{R}^{n \\\\times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \\\\mathbb{E} ~\\\\left\\\\| Ax_{k+1}-b\\\\right\\\\|^2_{2} \\\\leq \\\\left(1 + \\\\frac{c_{A}}{\\\\|A\\\\|_F^2}\\\\right) \\\\left\\\\|A x_k -b \\\\right\\\\|^2_{2} - \\\\frac{2}{\\\\|A\\\\|_F^2} \\\\left\\\\|A^T A (x_k - x)\\\\right\\\\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.\",\"PeriodicalId\":282695,\"journal\":{\"name\":\"ETNA - Electronic Transactions on Numerical Analysis\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ETNA - Electronic Transactions on Numerical Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1553/etna_vol54s610\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETNA - Electronic Transactions on Numerical Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1553/etna_vol54s610","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

研究了对于可逆的$A \in \mathbb{R}^{n \times n}$,应用于$\|Ax -b \|_2^2 \rightarrow \min$的随机梯度下降的行为。我们表明,有一个显式常数$c_{A}$(温和地)依赖于$A$,使得$$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$这是一个奇怪的不等式:最后一项比其余项多一个矩阵应用于残差$u_k - u$:如果$x_k - x$主要由大的奇异向量组成,随机梯度下降导致快速正则化。对于对称矩阵,这个不等式可以推广到高阶Sobolev空间。这解释了一个(已知的)正则化现象:从大奇异值到小奇异值的能量级联平滑。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On the regularization effect of stochastic gradient descent applied to least-squares
We study the behavior of stochastic gradient descent applied to $\|Ax -b \|_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\| Ax_{k+1}-b\right\|^2_{2} \leq \left(1 + \frac{c_{A}}{\|A\|_F^2}\right) \left\|A x_k -b \right\|^2_{2} - \frac{2}{\|A\|_F^2} \left\|A^T A (x_k - x)\right\|^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k - u$ than the remaining terms: if $x_k - x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization. For symmetric matrices, this inequality has an extension to higher-order Sobolev spaces. This explains a (known) regularization phenomenon: an energy cascade from large singular values to small singular values smoothes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信