Sijia Liu, Xingguo Li, Pin-Yu Chen, J. Haupt, Lisa Amini
{"title":"非凸优化的零阶随机投影梯度下降","authors":"Sijia Liu, Xingguo Li, Pin-Yu Chen, J. Haupt, Lisa Amini","doi":"10.1109/GlobalSIP.2018.8646618","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze the convergence of the zeroth-order stochastic projected gradient descent (ZO-SPGD) method for constrained convex and nonconvex optimization scenarios where only objective function values (not gradients) are directly available. We show statistical properties of a new random gradient estimator, constructed through random direction samples drawn from a bounded uniform distribution. We prove that ZO-SPGD yields a $O\\left( {\\frac{d}{{bq\\sqrt T }} + \\frac{1}{{\\sqrt T }}} \\right)$ convergence rate for convex but non-smooth optimization, where d is the number of optimization variables, b is the minibatch size, q is the number of random direction samples for gradient estimation, and T is the number of iterations. For nonconvex optimization, we show that ZO-SPGD achieves $O\\left( {\\frac{1}{{\\sqrt T }}} \\right)$ convergence rate but suffers an additional $O\\left( {\\frac{{d + q}}{{bq}}} \\right)$ error. Our the oretical investigation on ZO-SPGD provides a general framework to study the convergence rate of zeroth-order algorithms.","PeriodicalId":119131,"journal":{"name":"2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":"{\"title\":\"ZEROTH-ORDER STOCHASTIC PROJECTED GRADIENT DESCENT FOR NONCONVEX OPTIMIZATION\",\"authors\":\"Sijia Liu, Xingguo Li, Pin-Yu Chen, J. Haupt, Lisa Amini\",\"doi\":\"10.1109/GlobalSIP.2018.8646618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we analyze the convergence of the zeroth-order stochastic projected gradient descent (ZO-SPGD) method for constrained convex and nonconvex optimization scenarios where only objective function values (not gradients) are directly available. We show statistical properties of a new random gradient estimator, constructed through random direction samples drawn from a bounded uniform distribution. We prove that ZO-SPGD yields a $O\\\\left( {\\\\frac{d}{{bq\\\\sqrt T }} + \\\\frac{1}{{\\\\sqrt T }}} \\\\right)$ convergence rate for convex but non-smooth optimization, where d is the number of optimization variables, b is the minibatch size, q is the number of random direction samples for gradient estimation, and T is the number of iterations. For nonconvex optimization, we show that ZO-SPGD achieves $O\\\\left( {\\\\frac{1}{{\\\\sqrt T }}} \\\\right)$ convergence rate but suffers an additional $O\\\\left( {\\\\frac{{d + q}}{{bq}}} \\\\right)$ error. Our the oretical investigation on ZO-SPGD provides a general framework to study the convergence rate of zeroth-order algorithms.\",\"PeriodicalId\":119131,\"journal\":{\"name\":\"2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"22\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/GlobalSIP.2018.8646618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GlobalSIP.2018.8646618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
摘要
本文分析了零阶随机投影梯度下降(ZO-SPGD)方法在只有目标函数值(非梯度)直接可用的约束凸和非凸优化场景下的收敛性。我们展示了一种新的随机梯度估计量的统计性质,它是通过从有界均匀分布中抽取的随机方向样本构造的。我们证明了ZO-SPGD对于凸但非光滑优化的收敛速度为$O\left( {\frac{d}{{bq\sqrt T }} + \frac{1}{{\sqrt T }}} \right)$,其中d为优化变量的数量,b为小批量大小,q为梯度估计的随机方向样本的数量,T为迭代次数。对于非凸优化,我们表明ZO-SPGD达到$O\left( {\frac{1}{{\sqrt T }}} \right)$收敛速度,但遭受额外的$O\left( {\frac{{d + q}}{{bq}}} \right)$误差。我们对ZO-SPGD的理论研究为研究零阶算法的收敛速度提供了一个一般框架。
ZEROTH-ORDER STOCHASTIC PROJECTED GRADIENT DESCENT FOR NONCONVEX OPTIMIZATION
In this paper, we analyze the convergence of the zeroth-order stochastic projected gradient descent (ZO-SPGD) method for constrained convex and nonconvex optimization scenarios where only objective function values (not gradients) are directly available. We show statistical properties of a new random gradient estimator, constructed through random direction samples drawn from a bounded uniform distribution. We prove that ZO-SPGD yields a $O\left( {\frac{d}{{bq\sqrt T }} + \frac{1}{{\sqrt T }}} \right)$ convergence rate for convex but non-smooth optimization, where d is the number of optimization variables, b is the minibatch size, q is the number of random direction samples for gradient estimation, and T is the number of iterations. For nonconvex optimization, we show that ZO-SPGD achieves $O\left( {\frac{1}{{\sqrt T }}} \right)$ convergence rate but suffers an additional $O\left( {\frac{{d + q}}{{bq}}} \right)$ error. Our the oretical investigation on ZO-SPGD provides a general framework to study the convergence rate of zeroth-order algorithms.