Byzantine-Resilient Gradient Coding Through Local Gradient Computations

IF 2.2 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Information Theory Pub Date : 2025-02-18 DOI:10.1109/TIT.2025.3542896

Christoph Hofmeister;Luis Maßny;Eitan Yaakobi;Rawad Bitar

{"title":"Byzantine-Resilient Gradient Coding Through Local Gradient Computations","authors":"Christoph Hofmeister;Luis Maßny;Eitan Yaakobi;Rawad Bitar","doi":"10.1109/TIT.2025.3542896","DOIUrl":null,"url":null,"abstract":"We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers <italic>each partial gradient</i> is computed by. In this work, we propose a way to reduce the replication to <inline-formula> <tex-math>$ {s} +1$ </tex-math></inline-formula> instead of <inline-formula> <tex-math>$2 {s} +1$ </tex-math></inline-formula> in the presence of <italic>s</i> malicious workers. Our method detects erroneous inputs from the malicious workers, transforming them into erasures. This comes at the expense of <italic>s</i> additional local computations at the main node and additional rounds of light communication between the main node and the workers. We define a general framework and give fundamental limits for fractional repetition data allocations. Our scheme is optimal in terms of replication and local computation and incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound. We furthermore show how additional redundancy can be exploited to reduce the number of local computations and communication cost, or, alternatively, tolerate straggling workers.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 4","pages":"3142-3156"},"PeriodicalIF":2.2000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10891921","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10891921/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partial gradient is computed by. In this work, we propose a way to reduce the replication to

$ {s} +1$

instead of

$2 {s} +1$

in the presence of s malicious workers. Our method detects erroneous inputs from the malicious workers, transforming them into erasures. This comes at the expense of s additional local computations at the main node and additional rounds of light communication between the main node and the workers. We define a general framework and give fundamental limits for fractional repetition data allocations. Our scheme is optimal in terms of replication and local computation and incurs a communication cost that is asymptotically, in the size of the dataset, a multiplicative factor away from the derived bound. We furthermore show how additional redundancy can be exploited to reduce the number of local computations and communication cost, or, alternatively, tolerate straggling workers.

查看原文本刊更多论文

局部梯度计算的拜占庭弹性梯度编码

我们考虑在对手控制所谓的恶意工作者试图破坏计算的情况下进行梯度编码。以前的工作建议使用MDS代码将恶意工作人员的响应视为错误，并使用代码的纠错属性对其进行纠正。这是以增加复制为代价的，即，计算每个部分梯度的工人数量。在这项工作中，我们提出了一种在存在5个恶意工作者的情况下将复制减少到$ {s} +1$而不是$2 {s} +1$的方法。我们的方法检测来自恶意工作者的错误输入，并将其转换为擦除。这是以主节点上额外的5次本地计算和主节点与工作节点之间额外的光通信为代价的。我们定义了一个一般框架，并给出了分数重复数据分配的基本限制。我们的方案在复制和局部计算方面是最优的，并且在数据集的大小方面产生的通信成本是渐近的，远离派生界的乘因子。我们进一步展示了如何利用额外的冗余来减少本地计算的数量和通信成本，或者，或者，容忍分散的工人。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Information Theory 工程技术-工程：电子与电气

CiteScore

5.70

自引率

20.00%

发文量

514

审稿时长

12 months

期刊介绍： The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.