Fault Tolerant Gradient Clock Synchronization

Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing Pub Date : 2019-02-21 DOI:10.1145/3293611.3331637

J. Bund, C. Lenzen, Will Rosenbaum

{"title":"Fault Tolerant Gradient Clock Synchronization","authors":"J. Bund, C. Lenzen, Will Rosenbaum","doi":"10.1145/3293611.3331637","DOIUrl":null,"url":null,"abstract":"Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems, and the optimal achievable local skew in general fault-free networks. However, so far nothing non-trivial is known about the local skew that can be achieved in non-fully-connected topologies even under a single Byzantine fault. In this work, we show that asymptotically optimal local skew can be achieved in the presence of Byzantine faults. Our approach combines the Lynch-Welch algorithm [19] for synchronizing a clique of n nodes with up to ƒ < n/3 Byzantine faults, and the gradient clock synchronization (GCS) algorithm by Lenzen et al. [15] in order to render the latter resilient to faults. This is not possible on general graphs, so we augment an arbitrary input graph G by replacing each node with a fully connected cluster of 3 ƒ +1 copies, and execute an instance of the Lynch-Welch algorithm within each cluster. We interpret the clusters as supernodes executing the GCS algorithm on G, where each node in the cluster maintains an estimate of the logical clock of its supernode. By also fully connecting clusters corresponding to neighbors in l G, supernodes maintain estimates of neighboring clusters' logical clocks. We achieve asymptotically optimal local skew, assuming that no cluster contains more than ƒ faulty nodes. This construction yields factors of O(ƒ) and O(ƒ2) overheads in terms of nodes and edges, respectively. Since tolerating ƒ faulty neighbors trivially requires degrees larger than ƒ, these overheads are asymptotically optimal.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"148 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3293611.3331637","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Synchronizing clocks in distributed systems is well-understood, both in terms of fault-tolerance in fully connected systems, and the optimal achievable local skew in general fault-free networks. However, so far nothing non-trivial is known about the local skew that can be achieved in non-fully-connected topologies even under a single Byzantine fault. In this work, we show that asymptotically optimal local skew can be achieved in the presence of Byzantine faults. Our approach combines the Lynch-Welch algorithm [19] for synchronizing a clique of n nodes with up to ƒ < n/3 Byzantine faults, and the gradient clock synchronization (GCS) algorithm by Lenzen et al. [15] in order to render the latter resilient to faults. This is not possible on general graphs, so we augment an arbitrary input graph G by replacing each node with a fully connected cluster of 3 ƒ +1 copies, and execute an instance of the Lynch-Welch algorithm within each cluster. We interpret the clusters as supernodes executing the GCS algorithm on G, where each node in the cluster maintains an estimate of the logical clock of its supernode. By also fully connecting clusters corresponding to neighbors in l G, supernodes maintain estimates of neighboring clusters' logical clocks. We achieve asymptotically optimal local skew, assuming that no cluster contains more than ƒ faulty nodes. This construction yields factors of O(ƒ) and O(ƒ2) overheads in terms of nodes and edges, respectively. Since tolerating ƒ faulty neighbors trivially requires degrees larger than ƒ, these overheads are asymptotically optimal.

查看原文本刊更多论文

容错梯度时钟同步

在分布式系统中同步时钟是很容易理解的，无论是在完全连接的系统中的容错性，还是在一般无故障网络中可实现的最优本地倾斜。然而，到目前为止，我们还不知道在非完全连接拓扑中甚至在单个拜占庭故障下可以实现的局部倾斜。在这项工作中，我们证明了在拜占庭断层存在的情况下，可以实现渐近最优的局部偏态。我们的方法结合了Lynch-Welch算法[19]和Lenzen等人[15]提出的梯度时钟同步(GCS)算法，前者用于同步n个节点组成的团，其中包含多达f < n/3个拜占庭故障，后者可使其对故障具有弹性。这在一般图上是不可能的，所以我们通过用3f +1副本的全连接集群替换每个节点来增加任意输入图G，并在每个集群中执行Lynch-Welch算法的实例。我们将集群解释为在G上执行GCS算法的超级节点，其中集群中的每个节点维护其超级节点的逻辑时钟估计。通过完全连接G中与相邻集群相对应的集群，超级节点保持对相邻集群逻辑时钟的估计。我们实现了渐近最优的局部偏斜，假设没有集群包含超过f个故障节点。这种构造在节点和边方面的开销分别为O(f)和O(ƒ2)。由于容忍有缺陷的邻域通常要求度大于，因此这些开销是渐近最优的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing

自引率

0.00%

发文量