{"title":"论有偏梯度的分散随机梯度下降的收敛性","authors":"Yiming Jiang;Helei Kang;Jinlan Liu;Dongpo Xu","doi":"10.1109/TSP.2025.3531356","DOIUrl":null,"url":null,"abstract":"Stochastic optimization algorithms are widely used to solve large-scale machine learning problems. However, their theoretical analysis necessitates access to unbiased estimates of the true gradients. To address this issue, we perform a comprehensive convergence rate analysis of stochastic gradient descent (SGD) with biased gradients for decentralized optimization. In non-convex settings, we show that for decentralized SGD utilizing biased gradients, the gradient in expectation is bounded asymptotically at a rate of <inline-formula><tex-math>$\\mathcal{O}(1/\\sqrt{nT}+n/T)$</tex-math></inline-formula>, and the bound is linearly correlated to the biased gradient gap. In particular, we can recover the convergence results in the unbiased stochastic gradient setting when the biased gradient gap is zero. Lastly, we provide empirical support for our theoretical findings through extensive numerical experiments.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"73 ","pages":"549-558"},"PeriodicalIF":4.6000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Convergence of Decentralized Stochastic Gradient Descent With Biased Gradients\",\"authors\":\"Yiming Jiang;Helei Kang;Jinlan Liu;Dongpo Xu\",\"doi\":\"10.1109/TSP.2025.3531356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Stochastic optimization algorithms are widely used to solve large-scale machine learning problems. However, their theoretical analysis necessitates access to unbiased estimates of the true gradients. To address this issue, we perform a comprehensive convergence rate analysis of stochastic gradient descent (SGD) with biased gradients for decentralized optimization. In non-convex settings, we show that for decentralized SGD utilizing biased gradients, the gradient in expectation is bounded asymptotically at a rate of <inline-formula><tex-math>$\\\\mathcal{O}(1/\\\\sqrt{nT}+n/T)$</tex-math></inline-formula>, and the bound is linearly correlated to the biased gradient gap. In particular, we can recover the convergence results in the unbiased stochastic gradient setting when the biased gradient gap is zero. Lastly, we provide empirical support for our theoretical findings through extensive numerical experiments.\",\"PeriodicalId\":13330,\"journal\":{\"name\":\"IEEE Transactions on Signal Processing\",\"volume\":\"73 \",\"pages\":\"549-558\"},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2025-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal Processing\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10847585/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10847585/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
On the Convergence of Decentralized Stochastic Gradient Descent With Biased Gradients
Stochastic optimization algorithms are widely used to solve large-scale machine learning problems. However, their theoretical analysis necessitates access to unbiased estimates of the true gradients. To address this issue, we perform a comprehensive convergence rate analysis of stochastic gradient descent (SGD) with biased gradients for decentralized optimization. In non-convex settings, we show that for decentralized SGD utilizing biased gradients, the gradient in expectation is bounded asymptotically at a rate of $\mathcal{O}(1/\sqrt{nT}+n/T)$, and the bound is linearly correlated to the biased gradient gap. In particular, we can recover the convergence results in the unbiased stochastic gradient setting when the biased gradient gap is zero. Lastly, we provide empirical support for our theoretical findings through extensive numerical experiments.
期刊介绍:
The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.