STSyn: Speeding Up Local SGD With Straggler-Tolerant Synchronization

IF 4.6 2区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Feng Zhu;Jingjing Zhang;Xin Wang
{"title":"STSyn: Speeding Up Local SGD With Straggler-Tolerant Synchronization","authors":"Feng Zhu;Jingjing Zhang;Xin Wang","doi":"10.1109/TSP.2024.3452035","DOIUrl":null,"url":null,"abstract":"Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. To address this issue, a novel local SGD strategy called STSyn is proposed in this paper. The key point is to wait for the \n<inline-formula><tex-math>$K$</tex-math></inline-formula>\n fastest workers while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. To evaluate the performance of STSyn, an analysis of the average wall-clock time, average number of local updates, and average number of uploading workers per round is provided. The convergence of STSyn is also rigorously established even when the objective function is nonconvex for both homogeneous and heterogeneous data distributions. Experimental results highlight the superiority of STSyn over state-of-the-art schemes, thanks to its straggler-tolerant technique and the inclusion of additional effective local updates at each worker. Furthermore, the impact of system parameters is investigated. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.","PeriodicalId":13330,"journal":{"name":"IEEE Transactions on Signal Processing","volume":"72 ","pages":"4050-4064"},"PeriodicalIF":4.6000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal Processing","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10659740/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Synchronous local stochastic gradient descent (local SGD) suffers from some workers being idle and random delays due to slow and straggling workers, as it waits for the workers to complete the same amount of local updates. To address this issue, a novel local SGD strategy called STSyn is proposed in this paper. The key point is to wait for the $K$ fastest workers while keeping all the workers computing continually at each synchronization round, and making full use of any effective (completed) local update of each worker regardless of stragglers. To evaluate the performance of STSyn, an analysis of the average wall-clock time, average number of local updates, and average number of uploading workers per round is provided. The convergence of STSyn is also rigorously established even when the objective function is nonconvex for both homogeneous and heterogeneous data distributions. Experimental results highlight the superiority of STSyn over state-of-the-art schemes, thanks to its straggler-tolerant technique and the inclusion of additional effective local updates at each worker. Furthermore, the impact of system parameters is investigated. By waiting for faster workers and allowing heterogeneous synchronization with different numbers of local updates across workers, STSyn provides substantial improvements both in time and communication efficiency.
STSyn:利用容错同步加速本地 SGD
同步局部随机梯度下降(local SGD)由于需要等待工作者完成相同数量的局部更新,因此会出现部分工作者闲置以及缓慢和滞后工作者造成的随机延迟。为解决这一问题,本文提出了一种名为 STSyn 的新型局部 SGD 策略。其关键在于等待速度最快的 $K$ 工作者,同时保持所有工作者在每一轮同步中持续计算,并充分利用每个工作者的任何有效(已完成)本地更新,而不考虑拖后腿的工作者。为了评估 STSyn 的性能,我们对每轮的平均挂钟时间、平均本地更新次数和平均上传工作者数量进行了分析。此外,还严格确定了 STSyn 的收敛性,即使目标函数对同质和异质数据分布都是非凸的。实验结果凸显了 STSyn 相对于最先进方案的优越性,这要归功于它的流浪者容忍技术以及在每个 Worker 中包含的额外有效局部更新。此外,还研究了系统参数的影响。通过等待速度更快的 Worker 并允许异构同步不同数量的局部更新,STSyn 在时间和通信效率方面都有了显著提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Signal Processing
IEEE Transactions on Signal Processing 工程技术-工程:电子与电气
CiteScore
11.20
自引率
9.30%
发文量
310
审稿时长
3.0 months
期刊介绍: The IEEE Transactions on Signal Processing covers novel theory, algorithms, performance analyses and applications of techniques for the processing, understanding, learning, retrieval, mining, and extraction of information from signals. The term “signal” includes, among others, audio, video, speech, image, communication, geophysical, sonar, radar, medical and musical signals. Examples of topics of interest include, but are not limited to, information processing and the theory and application of filtering, coding, transmitting, estimating, detecting, analyzing, recognizing, synthesizing, recording, and reproducing signals.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信