通过轻量级零阶近似梯度算法降低查询复杂度

IF 2.7 4区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang
{"title":"通过轻量级零阶近似梯度算法降低查询复杂度","authors":"Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang","doi":"10.1162/neco_a_01636","DOIUrl":null,"url":null,"abstract":"Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.","PeriodicalId":54731,"journal":{"name":"Neural Computation","volume":"36 5","pages":"897-935"},"PeriodicalIF":2.7000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms\",\"authors\":\"Bin Gu;Xiyuan Wei;Hualin Zhang;Yi Chang;Heng Huang\",\"doi\":\"10.1162/neco_a_01636\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.\",\"PeriodicalId\":54731,\"journal\":{\"name\":\"Neural Computation\",\"volume\":\"36 5\",\"pages\":\"897-935\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10535065/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535065/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

对于梯度计算昂贵或无法实现的机器学习问题,零阶(ZO)优化是一项关键技术。为了加快非光滑问题的 ZO 优化速度,人们提出了几种方差缩小 ZO 近似算法,所有这些算法在逼近真实梯度时都选择了协调 ZO 估计器,而不是随机 ZO 估计器,因为前者更准确。虽然与协调 ZO 估计器相比,随机 ZO 估计器引入的误差更大,收敛分析更具挑战性,但它只需要 O(1) 计算量,明显少于协调 ZO 估计器的 O(d) 计算量(d 为问题空间的维数)。为了利用随机 ZO 估计器的高效计算特性,我们首先提出了一种 ZO 目标下降(ZOOD)特性,它可以将两种不同类型的误差纳入收敛速率的上限。接下来,我们提出了两种通用的 ZO 优化还原框架,只要内求解器的收敛速率满足 ZOOD 属性,它们就能分别自动推导出凸问题和非凸问题的收敛结果。在我们提出的 ZOR-ProxSVRG 和 ZOR-ProxSAGA 这两个具有全随机 ZO 估计子的方差降低 ZO 近似算法上应用了两个降低框架,我们将最先进的函数查询复杂度从 Omindn1/2ε2,dε3 提高到 O˜n+dε2(d>n12 时)(适用于非凸问题),并将凸问题的复杂度从 Odε2 提高到 O˜nlog1ε+dε。最后,我们通过实验验证了所提方法的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Obtaining Lower Query Complexities Through Lightweight Zeroth-Order Proximal Gradient Algorithms
Zeroth-order (ZO) optimization is one key technique for machine learning problems where gradient calculation is expensive or impossible. Several variance, reduced ZO proximal algorithms have been proposed to speed up ZO optimization for nonsmooth problems, and all of them opted for the coordinated ZO estimator against the random ZO estimator when approximating the true gradient, since the former is more accurate. While the random ZO estimator introduces a larger error and makes convergence analysis more challenging compared to coordinated ZO estimator, it requires only O(1) computation, which is significantly less than O(d) computation of the coordinated ZO estimator, with d being dimension of the problem space. To take advantage of the computationally efficient nature of the random ZO estimator, we first propose a ZO objective decrease (ZOOD) property that can incorporate two different types of errors in the upper bound of convergence rate. Next, we propose two generic reduction frameworks for ZO optimization, which can automatically derive the convergence results for convex and nonconvex problems, respectively, as long as the convergence rate for the inner solver satisfies the ZOOD property. With the application of two reduction frameworks on our proposed ZOR-ProxSVRG and ZOR-ProxSAGA, two variance-reduced ZO proximal algorithms with fully random ZO estimators, we improve the state-of-the-art function query complexities from Omindn1/2ε2,dε3 to O˜n+dε2 under d>n12 for nonconvex problems, and from Odε2 to O˜nlog1ε+dε for convex problems. Finally, we conduct experiments to verify the superiority of our proposed methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Neural Computation
Neural Computation 工程技术-计算机:人工智能
CiteScore
6.30
自引率
3.40%
发文量
83
审稿时长
3.0 months
期刊介绍: Neural Computation is uniquely positioned at the crossroads between neuroscience and TMCS and welcomes the submission of original papers from all areas of TMCS, including: Advanced experimental design; Analysis of chemical sensor data; Connectomic reconstructions; Analysis of multielectrode and optical recordings; Genetic data for cell identity; Analysis of behavioral data; Multiscale models; Analysis of molecular mechanisms; Neuroinformatics; Analysis of brain imaging data; Neuromorphic engineering; Principles of neural coding, computation, circuit dynamics, and plasticity; Theories of brain function.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信