{"title":"Revisiting the Paxos Foundations: A Look at Summer Internship Work at VMware Research","authors":"H. Howard, D. Malkhi, A. Spiegelman","doi":"10.1145/3139645.3139656","DOIUrl":null,"url":null,"abstract":"The summer of 2016 was buzzing with intern activity at the VMware Research Group (VRG), working with all the research team and with David Tennenhouse, Chief Research Officer of VMware. In this paper, we give a brief introduction to Flexible Paxos [4], one of the internship results. There were several other exciting outcomes; internships are a great way to participate in driving innovation at VMware! Flexible Paxos introduces a surprising observation concerning the foundations distributed computing. The observation revisits the basic requisites of Paxos [7, 8], Lamport’s widely adopted algorithmic foundation for fault tolerance and replication, and a pinnacle of his Turing award [1]. Since its publication, Paxos has been widely built upon in teaching, research and production systems. Paxos implements a fault tolerant state-machine among a group of nodes. At its core, Paxos uses two phases, each requires agreement from a subset of nodes (known as a quorum) to proceed. Throughout this manuscript, we will refer to the first phase as the leader election phase, and the second as the replication phase. The safety and liveness of Paxos is based on the guarantee that any two quorums will intersect. To satisfy this requirement, quorums are typically composed of any majority from a fixed set of nodes, although other quorum schemes have been proposed. In practice, we usually wish to reach agreement over a sequence of commands, not one. This is often referred to as the Multi-Paxos problem [3]. In Multi-Paxos, we use the leader election phase of Paxos to establish one node as a leader for all future commands, until it is replaced by another leader. We use the replication phase of Paxos to agree on a series of commands, one at a time. To commit a command, the leader must always communicate with at least a quorum of nodes and wait for them to accept the value. In the Flexible Paxos work, we observe that Paxos is conservative:","PeriodicalId":7046,"journal":{"name":"ACM SIGOPS Oper. Syst. Rev.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2017-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGOPS Oper. Syst. Rev.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3139645.3139656","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The summer of 2016 was buzzing with intern activity at the VMware Research Group (VRG), working with all the research team and with David Tennenhouse, Chief Research Officer of VMware. In this paper, we give a brief introduction to Flexible Paxos [4], one of the internship results. There were several other exciting outcomes; internships are a great way to participate in driving innovation at VMware! Flexible Paxos introduces a surprising observation concerning the foundations distributed computing. The observation revisits the basic requisites of Paxos [7, 8], Lamport’s widely adopted algorithmic foundation for fault tolerance and replication, and a pinnacle of his Turing award [1]. Since its publication, Paxos has been widely built upon in teaching, research and production systems. Paxos implements a fault tolerant state-machine among a group of nodes. At its core, Paxos uses two phases, each requires agreement from a subset of nodes (known as a quorum) to proceed. Throughout this manuscript, we will refer to the first phase as the leader election phase, and the second as the replication phase. The safety and liveness of Paxos is based on the guarantee that any two quorums will intersect. To satisfy this requirement, quorums are typically composed of any majority from a fixed set of nodes, although other quorum schemes have been proposed. In practice, we usually wish to reach agreement over a sequence of commands, not one. This is often referred to as the Multi-Paxos problem [3]. In Multi-Paxos, we use the leader election phase of Paxos to establish one node as a leader for all future commands, until it is replaced by another leader. We use the replication phase of Paxos to agree on a series of commands, one at a time. To commit a command, the leader must always communicate with at least a quorum of nodes and wait for them to accept the value. In the Flexible Paxos work, we observe that Paxos is conservative:
2016年夏天,VMware Research Group (VRG)的实习生活动非常活跃,我与整个研究团队以及VMware首席研究官David Tennenhouse一起工作。在本文中,我们将对实习成果之一Flexible Paxos[4]进行简要介绍。还有其他几个令人兴奋的结果;实习是参与推动VMware创新的好方法!灵活的Paxos引入了一个关于基础分布式计算的惊人观察。这一观察回顾了Paxos的基本要求[7,8],这是Lamport广泛采用的容错和复制算法基础,也是他获得图灵奖的巅峰之作[1]。自发布以来,Paxos已广泛应用于教学、研究和生产系统。Paxos在一组节点之间实现容错状态机。Paxos的核心使用两个阶段,每个阶段都需要得到节点子集(称为quorum)的同意才能继续进行。在本文中,我们将第一阶段称为领导者选举阶段,第二阶段称为复制阶段。Paxos的安全性和活动性是建立在保证任意两个群体将相交的基础上的。为了满足这一要求,仲裁通常由来自一组固定节点的任何多数组成,尽管已经提出了其他仲裁方案。在实践中,我们通常希望在一系列命令上达成一致,而不是一个命令。这通常被称为Multi-Paxos问题[3]。在Multi-Paxos中,我们使用Paxos的leader选举阶段来建立一个节点作为未来所有命令的leader,直到它被另一个leader所取代。我们使用Paxos的复制阶段来商定一系列命令,一次一个。要提交命令,leader必须始终与至少一定数量的节点通信,并等待它们接受该值。在Flexible Paxos工作中,我们观察到Paxos是保守的: