Asynchronous fully-decentralized SGD in the cluster-based model

IF 0.9 4区计算机科学 Q3 COMPUTER SCIENCE, THEORY & METHODS

Theoretical Computer Science Pub Date : 2025-01-15 DOI:10.1016/j.tcs.2025.115073

Hagit Attiya , Noa Schiller

{"title":"Asynchronous fully-decentralized SGD in the cluster-based model","authors":"Hagit Attiya , Noa Schiller","doi":"10.1016/j.tcs.2025.115073","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents fault-tolerant asynchronous <em>Stochastic Gradient Descent</em> (<em>SGD</em>) algorithms. SGD is widely used for approximating the minimum of a cost function <em>Q</em>, a core part of optimization and learning algorithms. Our algorithms are designed for the <em>cluster-based</em> model, which combines message-passing and shared-memory communication layers. Processes may fail by <em>crashing</em>, and the algorithm inside each cluster is <em>wait-free</em>, using only reads and writes.</div><div>For a <em>strongly convex Q</em>, our algorithm <em>can withstand partitions of the system</em>. It provides convergence rate that is the maximal distributed acceleration over the optimal convergence rate of <em>sequential</em> SGD.</div><div>For arbitrary smooth functions, the convergence rate has an additional term that depends on the maximal difference between the parameters at the same iteration. (This holds under standard assumptions on <em>Q</em>.) In this case, the algorithm obtains the same convergence rate as sequential SGD, up to a logarithmic factor. This is achieved by using, at each iteration, a <em>multidimensional approximate agreement</em> algorithm, tailored for the cluster-based model.</div><div>The general algorithm communicates with nonfaulty processes belonging to clusters that include a majority of all processes. We prove that this condition is necessary when optimizing some non-convex functions.</div></div>","PeriodicalId":49438,"journal":{"name":"Theoretical Computer Science","volume":"1031 ","pages":"Article 115073"},"PeriodicalIF":0.9000,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical Computer Science","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304397525000118","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper presents fault-tolerant asynchronous Stochastic Gradient Descent (SGD) algorithms. SGD is widely used for approximating the minimum of a cost function Q, a core part of optimization and learning algorithms. Our algorithms are designed for the cluster-based model, which combines message-passing and shared-memory communication layers. Processes may fail by crashing, and the algorithm inside each cluster is wait-free, using only reads and writes.

For a strongly convex Q, our algorithm can withstand partitions of the system. It provides convergence rate that is the maximal distributed acceleration over the optimal convergence rate of sequential SGD.

For arbitrary smooth functions, the convergence rate has an additional term that depends on the maximal difference between the parameters at the same iteration. (This holds under standard assumptions on Q.) In this case, the algorithm obtains the same convergence rate as sequential SGD, up to a logarithmic factor. This is achieved by using, at each iteration, a multidimensional approximate agreement algorithm, tailored for the cluster-based model.

The general algorithm communicates with nonfaulty processes belonging to clusters that include a majority of all processes. We prove that this condition is necessary when optimizing some non-convex functions.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Theoretical Computer Science 工程技术-计算机：理论方法

CiteScore

2.60

自引率

18.20%

发文量

471

审稿时长

12.6 months

期刊介绍： Theoretical Computer Science is mathematical and abstract in spirit, but it derives its motivation from practical and everyday computation. Its aim is to understand the nature of computation and, as a consequence of this understanding, provide more efficient methodologies. All papers introducing or studying mathematical, logic and formal concepts and methods are welcome, provided that their motivation is clearly drawn from the field of computing.