具有全局最优性证明的超参数化非凸Burer-Monteiro分解的预条件梯度下降

G. Zhang, S. Fattahi, Richard Y. Zhang
{"title":"具有全局最优性证明的超参数化非凸Burer-Monteiro分解的预条件梯度下降","authors":"G. Zhang, S. Fattahi, Richard Y. Zhang","doi":"10.48550/arXiv.2206.03345","DOIUrl":null,"url":null,"abstract":"We consider using gradient descent to minimize the nonconvex function $f(X)=\\phi(XX^{T})$ over an $n\\times r$ factor matrix $X$, in which $\\phi$ is an underlying smooth convex cost function defined over $n\\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\\star}$ of the global minimizer $X^{\\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\\star}$ to a sublinear rate when $r>r^{\\star}$, even when $\\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\\star}$.","PeriodicalId":14794,"journal":{"name":"J. Mach. Learn. Res.","volume":"23 1","pages":"163:1-163:55"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Preconditioned Gradient Descent for Overparameterized Nonconvex Burer-Monteiro Factorization with Global Optimality Certification\",\"authors\":\"G. Zhang, S. Fattahi, Richard Y. Zhang\",\"doi\":\"10.48550/arXiv.2206.03345\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We consider using gradient descent to minimize the nonconvex function $f(X)=\\\\phi(XX^{T})$ over an $n\\\\times r$ factor matrix $X$, in which $\\\\phi$ is an underlying smooth convex cost function defined over $n\\\\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\\\\star}$ of the global minimizer $X^{\\\\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\\\\star}$ to a sublinear rate when $r>r^{\\\\star}$, even when $\\\\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\\\\star}$.\",\"PeriodicalId\":14794,\"journal\":{\"name\":\"J. Mach. Learn. Res.\",\"volume\":\"23 1\",\"pages\":\"163:1-163:55\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-06-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Mach. Learn. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2206.03345\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mach. Learn. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2206.03345","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

我们考虑使用梯度下降最小化非凸函数$f(X)=\phi(XX^{T})$除以一个$n\乘以r$因子矩阵$X$,其中$\phi$是一个定义在$n\乘以n$矩阵上的平滑凸代价函数。虽然在合理的时间内只能证明找到二阶平稳点$X$,但如果$X$又是秩不足的,则其秩不足证明它是全局最优的。这种证明全局最优性的方法必然要求当前迭代X$的搜索秩$r$相对于全局最小化器X^{\星}$的秩$r^{\星}$过度参数化。不幸的是,过度参数化显著地减慢了梯度下降的收敛速度,从$r=r^{\star}$的线性速率到$r>r^{\star}$的次线性速率,即使$\phi$是强凸的。在本文中,我们提出了一种廉价的预条件,使梯度下降的收敛速度在过参数化情况下恢复到线性,同时使其对全局最小器X^{\star}$可能的病态不可知。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Preconditioned Gradient Descent for Overparameterized Nonconvex Burer-Monteiro Factorization with Global Optimality Certification
We consider using gradient descent to minimize the nonconvex function $f(X)=\phi(XX^{T})$ over an $n\times r$ factor matrix $X$, in which $\phi$ is an underlying smooth convex cost function defined over $n\times n$ matrices. While only a second-order stationary point $X$ can be provably found in reasonable time, if $X$ is additionally rank deficient, then its rank deficiency certifies it as being globally optimal. This way of certifying global optimality necessarily requires the search rank $r$ of the current iterate $X$ to be overparameterized with respect to the rank $r^{\star}$ of the global minimizer $X^{\star}$. Unfortunately, overparameterization significantly slows down the convergence of gradient descent, from a linear rate with $r=r^{\star}$ to a sublinear rate when $r>r^{\star}$, even when $\phi$ is strongly convex. In this paper, we propose an inexpensive preconditioner that restores the convergence rate of gradient descent back to linear in the overparameterized case, while also making it agnostic to possible ill-conditioning in the global minimizer $X^{\star}$.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信