Convergence of Adaptive Stochastic Mirror Descent.

IF 8.9 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE transactions on neural networks and learning systems Pub Date : 2025-08-01 DOI:10.1109/TNNLS.2025.3545420

Ting Hu, Xiaotong Liu, Kai Ji, Yunwen Lei

{"title":"Convergence of Adaptive Stochastic Mirror Descent.","authors":"Ting Hu, Xiaotong Liu, Kai Ji, Yunwen Lei","doi":"10.1109/TNNLS.2025.3545420","DOIUrl":null,"url":null,"abstract":"<p><p>In this article, we present a family of adaptive stochastic optimization methods, which are associated with mirror maps that are widely used to capture the geometry properties of optimization problems during iteration processes. The well-known adaptive moment estimation (Adam)-type algorithm falls into the family when the mirror maps take the form of temporal adaptation. In the context of convex objective functions, we show that with proper step sizes and hyperparameters, the average regret can achieve the convergence rate ${\\mathcal { O}}(T^{-(1/2)})$ after T iterations under some standard assumptions. We further improve it to $O(T^{-1}\\log T)$ when the objective functions are strongly convex. In the context of smooth objective functions (not necessarily convex), based on properties of the strongly convex differentiable mirror map, our algorithms achieve convergence rates of order ${\\mathcal { O}}(T^{-(1/2)})$ up to a logarithmic term, requiring large or increasing hyperparameters that are coincident with practical usage of Adam-type algorithms. Thus, our work gives explanations for the selection of the hyperparameters in Adam-type algorithms' implementation.</p>","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"PP ","pages":"14963-14974"},"PeriodicalIF":8.9000,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/TNNLS.2025.3545420","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

In this article, we present a family of adaptive stochastic optimization methods, which are associated with mirror maps that are widely used to capture the geometry properties of optimization problems during iteration processes. The well-known adaptive moment estimation (Adam)-type algorithm falls into the family when the mirror maps take the form of temporal adaptation. In the context of convex objective functions, we show that with proper step sizes and hyperparameters, the average regret can achieve the convergence rate ${\mathcal { O}}(T^{-(1/2)})$ after T iterations under some standard assumptions. We further improve it to $O(T^{-1}\log T)$ when the objective functions are strongly convex. In the context of smooth objective functions (not necessarily convex), based on properties of the strongly convex differentiable mirror map, our algorithms achieve convergence rates of order ${\mathcal { O}}(T^{-(1/2)})$ up to a logarithmic term, requiring large or increasing hyperparameters that are coincident with practical usage of Adam-type algorithms. Thus, our work gives explanations for the selection of the hyperparameters in Adam-type algorithms' implementation.

查看原文本刊更多论文

自适应随机镜像下降的收敛性。

在本文中，我们提出了一系列自适应随机优化方法，这些方法与镜像映射相关联，镜像映射被广泛用于捕获迭代过程中优化问题的几何特性。众所周知的自适应矩估计（Adam）算法属于镜像映射采用时间自适应形式的算法。在凸目标函数的情况下，我们证明了在适当的步长和超参数下，在一定的标准假设下，平均后悔可以达到T次迭代后的收敛速度。我们进一步将其改进到当目标函数是强凸时。在光滑目标函数（不一定是凸）的背景下，基于强凸可微镜像映射的性质，我们的算法实现了到对数项的阶收敛速率，需要大的或增加的超参数，这与adam型算法的实际使用相一致。因此，我们的工作为亚当型算法实现中超参数的选择提供了解释。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on neural networks and learning systems COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

CiteScore

23.80

自引率

9.60%

发文量

2102

审稿时长

3-8 weeks

期刊介绍： The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.