Michael J. Whittaker, N. Giridharan, Adriana Szekeres, J. Hellerstein, H. Howard, Faisal Nawab, I. Stoica
{"title":"[Solution] Matchmaker Paxos: A Reconfigurable Consensus Protocol","authors":"Michael J. Whittaker, N. Giridharan, Adriana Szekeres, J. Hellerstein, H. Howard, Faisal Nawab, I. Stoica","doi":"10.5070/sr31154842","DOIUrl":null,"url":null,"abstract":"The paper presents Matchmaker Paxos/Multi-Paxos, a crash fault-tolerant consensus implementation and state machine replication system with vertical reconfiguration. The main contribution is the reconfiguration protocol, a critical component of Consensus implementations that is often overlooked by the research community, but that is critical in practice. Reviewers share the same common feedback, the paper is relevant and represents a valuable addition to the literature of high-performance reconfiguration techniques for consensus. Abstract State machine replication protocols, like MultiPaxos and Raft, are at the heart of numerous distributed systems. To tolerate machine failures, these protocols must replace failed machines with new machines, a process known as reconfiguration. Reconfiguration has become increasingly important over time as the need for frequent reconfiguration has grown. Despite this, reconfiguration has largely been neglected in the literature. In this paper, we present Matchmaker Paxos and Matchmaker MultiPaxos, a reconfigurable consensus and state machine replication protocol respectively. Our protocols can perform a reconfiguration with little to no impact on the latency or throughput of command processing; they can perform a reconfiguration in a few milliseconds; and they present a framework that can be generalized to other replication protocols in a way that previous reconfiguration techniques can not. We provide proofs of correctness for the protocols and optimizations, and present empirical results from an open source implementation showing that throughput and latency do not change significantly during a reconfiguration.","PeriodicalId":363427,"journal":{"name":"J. Syst. Res.","volume":"41 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Syst. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5070/sr31154842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The paper presents Matchmaker Paxos/Multi-Paxos, a crash fault-tolerant consensus implementation and state machine replication system with vertical reconfiguration. The main contribution is the reconfiguration protocol, a critical component of Consensus implementations that is often overlooked by the research community, but that is critical in practice. Reviewers share the same common feedback, the paper is relevant and represents a valuable addition to the literature of high-performance reconfiguration techniques for consensus. Abstract State machine replication protocols, like MultiPaxos and Raft, are at the heart of numerous distributed systems. To tolerate machine failures, these protocols must replace failed machines with new machines, a process known as reconfiguration. Reconfiguration has become increasingly important over time as the need for frequent reconfiguration has grown. Despite this, reconfiguration has largely been neglected in the literature. In this paper, we present Matchmaker Paxos and Matchmaker MultiPaxos, a reconfigurable consensus and state machine replication protocol respectively. Our protocols can perform a reconfiguration with little to no impact on the latency or throughput of command processing; they can perform a reconfiguration in a few milliseconds; and they present a framework that can be generalized to other replication protocols in a way that previous reconfiguration techniques can not. We provide proofs of correctness for the protocols and optimizations, and present empirical results from an open source implementation showing that throughput and latency do not change significantly during a reconfiguration.