A Principled Approach to Eventual Consistency

2011 IEEE 20th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises Pub Date : 2011-06-27 DOI:10.1109/WETICE.2011.76

M. Shapiro

{"title":"A Principled Approach to Eventual Consistency","authors":"M. Shapiro","doi":"10.1109/WETICE.2011.76","DOIUrl":null,"url":null,"abstract":"Replicating shared data is a fundamental mechanism in large-scale distributed systems, but suffers from a fundamental tension between scalability and data consistency. Eventual consistency sidesteps the (foreground) synchronisation bottleneck, but remains ad-hoc, error-prone, and difficult to prove correct. We present a promising new approach that is simple, scales almost indefinitely, and provably ensures eventual consistency: A CRDT is a data type that demonstrates some simple properties, viz. that its concurrent operations commute, or that its states form a semi-lattice. Any CRDT provably converges, provided all replicas eventually receive all operations. A CRDT requires no synchronisation: an update can execute immediately, irrespective of network latency, faults, or disconnection; it is highly scalable and fault-tolerant. The approach is necessarily limited since any task requiring consensus is out of reach. Nonetheless, many interesting and useful data types can be designed as a CRDT. We previously published the Treedoc CRDT, a sequence data type suited to concurrent editing tasks (as in a p2p wiki). This talk presents a portfolio of generally useful, non-trivial, composable CRDTs, including variations on counters, registers, sets, maps (key-value stores), graphs and sequences. This research is part of a systematic and principled study of CRDTs, to discover their power and limitations, and to better understand the underlying mechanisms and requirements. The challenges ahead include scaling garbage collection and integrating occasional non-commuting operations.","PeriodicalId":274311,"journal":{"name":"2011 IEEE 20th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises","volume":"50 18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 20th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WETICE.2011.76","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Replicating shared data is a fundamental mechanism in large-scale distributed systems, but suffers from a fundamental tension between scalability and data consistency. Eventual consistency sidesteps the (foreground) synchronisation bottleneck, but remains ad-hoc, error-prone, and difficult to prove correct. We present a promising new approach that is simple, scales almost indefinitely, and provably ensures eventual consistency: A CRDT is a data type that demonstrates some simple properties, viz. that its concurrent operations commute, or that its states form a semi-lattice. Any CRDT provably converges, provided all replicas eventually receive all operations. A CRDT requires no synchronisation: an update can execute immediately, irrespective of network latency, faults, or disconnection; it is highly scalable and fault-tolerant. The approach is necessarily limited since any task requiring consensus is out of reach. Nonetheless, many interesting and useful data types can be designed as a CRDT. We previously published the Treedoc CRDT, a sequence data type suited to concurrent editing tasks (as in a p2p wiki). This talk presents a portfolio of generally useful, non-trivial, composable CRDTs, including variations on counters, registers, sets, maps (key-value stores), graphs and sequences. This research is part of a systematic and principled study of CRDTs, to discover their power and limitations, and to better understand the underlying mechanisms and requirements. The challenges ahead include scaling garbage collection and integrating occasional non-commuting operations.

查看原文本刊更多论文

达到最终一致性的原则性方法

复制共享数据是大规模分布式系统中的一种基本机制，但存在可伸缩性和数据一致性之间的基本矛盾。最终一致性避开了(前台)同步瓶颈，但仍然是临时的、容易出错的、难以证明正确的。我们提出了一种很有前途的新方法，它很简单，几乎可以无限扩展，并且可以证明确保最终的一致性:CRDT是一种数据类型，它展示了一些简单的属性，即它的并发操作交换，或者它的状态形成半格。只要所有副本最终接收到所有操作，任何CRDT都可以证明是收敛的。CRDT不需要同步:无论网络延迟、故障或断开连接，更新都可以立即执行;它具有高度可伸缩性和容错性。这种方法必然是有限的，因为任何需要协商一致意见的任务都无法达成。尽管如此，可以将许多有趣且有用的数据类型设计为CRDT。我们之前发布了Treedoc CRDT，这是一种序列数据类型，适合并发编辑任务(如在p2p wiki中)。本讲座介绍了一系列通常有用的、重要的、可组合的crdt，包括计数器、寄存器、集合、映射(键值存储)、图和序列的变体。本研究是crdt系统和原则性研究的一部分，旨在发现其功能和局限性，并更好地了解其潜在机制和需求。未来的挑战包括扩展垃圾收集和集成偶尔的非通勤操作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 20th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises

自引率

0.00%

发文量