{"title":"How Fast Reads Affect Multi-Valued Register Simulations","authors":"S. Chaudhuri, Reginald Frank, J. Welch","doi":"10.1145/3293611.3331580","DOIUrl":"https://doi.org/10.1145/3293611.3331580","url":null,"abstract":"We consider the problem of simulating a k-valued register in a wait-free manner using binary registers as building blocks, where k 2. We show that for any simulation using atomic binary base registers to simulate a safe k-valued register in which the read algorithm takes the optimal number of steps (log2 k), the write algorithm must take at least log2 k steps in the worst case. A fortiori, the same lower bound applies when the simulated register should be regular. Previously known algorithms show that both these lower bounds are tight. We also show that in order to simulate an atomic k-valued register for two readers, the optimal number of steps for the read algorithm must be strictly larger than log2 k.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130333954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Sharp Threshold Phenomenon for the Distributed Complexity of the Lovász Local Lemma","authors":"S. Brandt, Yannic Maus, Jara Uitto","doi":"10.1145/3293611.3331636","DOIUrl":"https://doi.org/10.1145/3293611.3331636","url":null,"abstract":"The Lovász Local Lemma (LLL) says that, given a set of bad events that depend on the values of some random variables and where each event happens with probability at most p and depends on at most d other events, there is an assignment of the variables that avoids all bad events if the LLL criterion ep(d+1)<1 is satisfied. Nowadays, in the area of distributed graph algorithms it has also become a powerful framework for developing---mostly randomized---algorithms. A classic result by Moser and Tardos yields an O(log^2 n) algorithm for the distributed Lovász Local Lemma [JACM'10] if ep(d + 1) < 1 is satisfied. Given a stronger criterion, i.e., demanding a smaller error probability, it is conceivable that we can find better algorithms. Indeed, for example Chung, Pettie and Su [PODC'14] gave an O(log_epd^2 n) algorithm under the epd^2 < 1 criterion. Going further, Ghaffari, Harris and Kuhn introduced an 2^O(√log log n ) time algorithm given d^8 p = O(1) [FOCS'18]. On the negative side, Brandt et al. and Chang et al. showed that we cannot go below Ω(log log n) (randomized) [STOC'16] and Ω(log n) (deterministic) [FOCS'16], respectively, under the criterion pleq 2^-d . Furthermore, there is a lower bound of Ω(log^* n) that holds for any criterion. In this paper, we study the dependency of the distributed complexity of the LLL problem on the chosen LLL criterion. We show that for the fundamental case of each random variable of the considered LLL instance being associated with an edge of the input graph, that is, each random variable influences at most two events, a sharp threshold phenomenon occurs at p = 2^-d : we provide a simple deterministic (!) algorithm that matches the Ω(log^* n) lower bound in bounded degree graphs, if p < 2^-d , whereas for p geq 2^-d , the Ωmega(log log n) randomized and the Ω(log n) deterministic lower bounds hold. In many applications variables affect more than two events; our main contribution is to extend our algorithm to the case where random variables influence at most three different bad events. We show that, surprisingly, the sharp threshold occurs at the exact same spot, providing evidence for our conjecture that this phenomenon always occurs at p = 2^-d , independent of the number r of events that are affected by a variable. Almost all steps of the proof framework we provide for the case r=3 extend directly to the case of arbitrary r; consequently, our approach serves as a step towards characterizing the complexity of the LLL under different exponential criteria.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129569176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Songze Li, Saeid Sahraei, Mingchao Yu, A. Avestimehr, Sreeram Kannan, P. Viswanath
{"title":"Coded State Machine -- Scaling State Machine Execution under Byzantine Faults","authors":"Songze Li, Saeid Sahraei, Mingchao Yu, A. Avestimehr, Sreeram Kannan, P. Viswanath","doi":"10.1145/3293611.3331573","DOIUrl":"https://doi.org/10.1145/3293611.3331573","url":null,"abstract":"We introduce Coded State Machine (CSM), an information-theoretic framework to securely and efficiently execute multiple state machines on Byzantine nodes. The standard method of solving this problem is using State Machine Replication, which achieves high security at the cost of low efficiency. CSM simultaneously achieves the optimal linear scaling in storage, throughput, and security with increasing network size. The storage is scaled via the design of Lagrange coded states and coded input commands that require the same storage size as their origins. The computational efficiency is scaled using a novel delegation algorithm, called INTERMIX, which is an information-theoretically verifiable matrix-vector multiplication algorithm of independent interest.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129878140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Self-Stabilizing Snapshot Objects for Asynchronous Failure-Prone Networked Systems","authors":"Chryssis Georgiou, Oskar Lundström, E. Schiller","doi":"10.1145/3293611.3331584","DOIUrl":"https://doi.org/10.1145/3293611.3331584","url":null,"abstract":"A snapshot object simulates the behavior of an array of single-writer/multi-reader shared registers that can be read atomically. Delporte-Gallet et al. proposed two fault-tolerant algorithms for snapshot objects in asynchronous crash-prone message-passing systems. Their first algorithm is non-blocking; it allows snapshot operations to terminate once all write operations had ceased. It uses O(n) messages of O(n v) bits, where n is the number of nodes and v is the number of bits it takes to represent the object. Their second algorithm allows snapshot operations to always terminate independently of write operations. It incurs O(n^2) messages. The fault model of Delporte-Gallet et al. considers node failures (crashes). We aim at the design of even more robust snapshot objects. We do so through the lenses of self-stabilization---a very strong notion of fault-tolerance. In addition to Delporte-Gallet et al.'s fault model, a self-stabilizing algorithm can recover after the occurrence of transient faults; these faults represent arbitrary violations of the assumptions according to which the system was designed to operate (as long as the code stays intact). In particular, in this work, we propose self-stabilizing variations of Delporte-Gallet et al.'s non-blocking algorithm and always-terminating algorithm. Our algorithms have similar communication costs to the ones by Delporte-Gallet et al. and O(1) recovery time (in terms of asynchronous cycles) from transient faults. The main differences are that our proposal considers repeated gossiping of O(v) bits messages and deals with bounded space, which is a prerequisite for self-stabilization.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126952886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Quantum Distributed Algorithm for the All-Pairs Shortest Path Problem in the CONGEST-CLIQUE Model","authors":"Taisuke Izumi, F. Gall","doi":"10.1145/3293611.3331628","DOIUrl":"https://doi.org/10.1145/3293611.3331628","url":null,"abstract":"The All-Pairs Shortest Path problem (APSP) is one of the most central problems in distributed computation. In the CONGEST-CLIQUE model, in which n nodes communicate with each other over a fully connected network by exchanging messages of O(łog n) bits in synchronous rounds, the best known general algorithm for APSP uses Õ(n1/3) rounds. Breaking this barrier is a fundamental challenge in distributed graph algorithms. In this paper we investigate for the first time quantum distributed algorithms in the CONGEST-CLIQUE model, where nodes can exchange messages of O(log n) quantum bits, and show that this barrier can be broken: we construct a Õ(n1/4)-round quantum distributed algorithm for the APSP over directed graphs with polynomial weights in the CONGEST-CLIQUE model. This speedup in the quantum setting contrasts with the case of the standard CONGEST model, for which Elkin et al. (PODC 2014) showed that quantum communication does not offer significant advantages over classical communication. Our quantum algorithm is based on a relationship discovered by Vassilevska Williams and Williams (JACM 2018) between the APSP and the detection of negative triangles in a graph. The quantum part of our algorithm exploits the framework for quantum distributed search recently developed by Le Gall and Magniez (PODC 2018). Our main technical contribution is a method showing how to implement multiple quantum searches (one for each edge in the graph) in parallel without introducing congestions.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129271863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconfigurable Atomic Transaction Commit","authors":"Manuel Bravo, Alexey Gotsman","doi":"10.1145/3293611.3331590","DOIUrl":"https://doi.org/10.1145/3293611.3331590","url":null,"abstract":"Modern data stores achieve scalability by partitioning data into shards and fault-tolerance by replicating each shard across several servers. A key component of such systems is a Transaction Certification Service (TCS), which atomically commits a transaction spanning multiple shards. Existing TCS protocols require 2f+1 crash-stop replicas per shard to tolerate f failures. In this paper we present atomic commit protocols that require only f+1 replicas and reconfigure the system upon failures using an external reconfiguration service. We furthermore rigorously prove that these protocols correctly implement a recently proposed TCS specification. We present protocols in two different models---the standard asynchronous message-passing model and a model with Remote Direct Memory Access (RDMA), which allows a machine to access the memory of another machine over the network without involving the latter's CPU. Our protocols are inspired by a recent FARM system for RDMA-based transaction processing. Our work codifies the core ideas of FARM as distributed TCS protocols, rigorously proves them correct and highlights the trade-offs required by the use of RDMA.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115249359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Use of Randomness in Local Distributed Graph Algorithms","authors":"M. Ghaffari, F. Kuhn","doi":"10.1145/3293611.3331610","DOIUrl":"https://doi.org/10.1145/3293611.3331610","url":null,"abstract":"We attempt to better understand randomization in local distributed graph algorithms by exploring how randomness is used and what we can gain from it: We first ask the question of how much randomness is needed to obtain efficient randomized algorithms. We show that for all locally checkable problems with poly log n-time randomized algorithms, there are such algorithms even if either (I) there is a only a single (private) independent random bit in each poly log n-neighborhood of the graph, (II) the (private) bits of randomness of different nodes are only poly log n-wise independent, or (III) there are only poly log n bits of global shared randomness (and no private randomness). Second, we study how much we can improve the error probability of randomized algorithms. For all locally checkable problems with poly log n-time randomized algorithms, we show that there are such algorithms that succeed with probability 1-n-2 ε(log log n) 2 and more generally T-round algorithms, for T ≥ poly log n, with success probability 1-n-2 εlog 2T. We also show that poly log n-time randomized algorithms with success probability 1-2-2 log ε n for some ε > 0 can be derandomized to poly log n-time deterministic algorithms. Both of the directions mentioned above, reducing the amount of randomness and improving the success probability, can be seen as partial derandomization of existing randomized algorithms. In all the above cases, we also show that any significant improvement of our results would lead to a major breakthrough, as it would imply significantly more efficient deterministic distributed algorithms for a wide class of problems.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115302090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Aguilera, N. Ben-David, R. Guerraoui, Virendra J. Marathe, I. Zablotchi
{"title":"The Impact of RDMA on Agreement","authors":"M. Aguilera, N. Ben-David, R. Guerraoui, Virendra J. Marathe, I. Zablotchi","doi":"10.1145/3293611.3331601","DOIUrl":"https://doi.org/10.1145/3293611.3331601","url":null,"abstract":"Remote Direct Memory Access (RDMA) is becoming widely available in data centers. This technology allows a process to directly read and write the memory of a remote host, with a mechanism to control access permissions. In this paper, we study the fundamental power of these capabilities. We consider the well-known problem of achieving consensus despite failures, and find that RDMA can improve the inherent trade-off in distributed computing between failure resilience and performance. Specifically, we show that RDMA allows algorithms that simultaneously achieve high resilience and high performance, while traditional algorithms had to choose one or another. With Byzantine failures, we give an algorithm that only requires n geq 2f_P + 1 processes (where f_P is the maximum number of faulty processes) and decides in two (network) delays in common executions. With crash failures, we give an algorithm that only requires n geq f_P + 1 processes and also decides in two delays. Both algorithms tolerate a minority of memory failures inherent to RDMA, and they provide safety in asynchronous systems and liveness with standard additional assumptions.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Counting the Population Size","authors":"P. Berenbrink, Dominik Kaaser, T. Radzik","doi":"10.1145/3293611.3331631","DOIUrl":"https://doi.org/10.1145/3293611.3331631","url":null,"abstract":"We consider the problem of counting the population size in the population model. In this model, we are given a distributed system of n identical agents which interact in pairs with the goal to solve a common task. In each time step, the two interacting agents are selected uniformly at random. In this paper, we consider so-called uniform protocols, where the actions of two agents upon an interaction may not depend on the population size n. We present two population protocols to count the size of the population: protocol Approximate, which computes with high probability either [log n] or [log n], and protocol CountExact, which computes the exact population size in optimal O(log n) interactions, using Õ (n) states. Both protocols can also be converted to stable protocols that give a correct result with probability 1 by using an additional multiplicative factor of O(log n) states.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130558877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maciej Kokociński, Tadeusz Kobus, P. Wojciechowski
{"title":"On Mixing Eventual and Strong Consistency: Bayou Revisited","authors":"Maciej Kokociński, Tadeusz Kobus, P. Wojciechowski","doi":"10.1145/3293611.3331583","DOIUrl":"https://doi.org/10.1145/3293611.3331583","url":null,"abstract":"In this paper we study the properties of eventually consistent distributed systems that feature arbitrarily complex semantics and mix eventual and strong consistency. These systems execute requests in a highly-available, weakly-consistent fashion, but also enable stronger guarantees through additional inter-replica synchronization mechanisms that require the ability to solve distributed consensus. We use the seminal Bayou system as a case study, and then generalize our findings to a whole class of systems. We show dubious and unintuitive behaviour exhibited by those systems and provide a theoretical framework for reasoning about their correctness. We also state an impossibility result that formally proves the inherent limitation of such systems, namely temporary operation reordering, which admits interim disagreement between replicas on the relative order in which the client requests were executed.","PeriodicalId":153766,"journal":{"name":"Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123902430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}