Designing a causally consistent protocol for geo-distributed partial replication

Proceedings of the First Workshop on Principles and Practice of Consistency for Distributed Data Pub Date : 2015-04-21 DOI:10.1145/2745947.2745953

Tyler Crain, M. Shapiro

{"title":"Designing a causally consistent protocol for geo-distributed partial replication","authors":"Tyler Crain, M. Shapiro","doi":"10.1145/2745947.2745953","DOIUrl":null,"url":null,"abstract":"Modern internet applications require scalability to millions of clients, response times in the tens of milliseconds, and availability in the presence of partitions, hardware faults and even disasters. To obtain these requirements, applications are usually geo-replicated across several data centres (DCs) spread throughout the world, providing clients with fast access to nearby DCs and fault-tolerance in case of a DC out-age. Using multiple replicas also has disadvantages, not only does this incur extra storage, bandwidth and hardware costs, but programming these systems becomes more difficult. To address the additional hardware costs, data is often partially replicated, meaning that only certain DCs will keep a copy of certain data, for example in a key-value store it may only store values corresponding to a portion of the keys. Additionally, to address the issue of programming these systems, consistency protocols are run on top ensuring different guarantees for the data, but as shown by the CAP theorem, strong consistency, availability, and partition tolerance cannot be ensured at the same time. For many applications availability is paramout, thus strong consistency is exchanged for weaker consistencies allowing concurrent writes like causal consistency. Unfortunately these protocols are not designed with partial replication in mind and either end up not supporting it or do so in an inefficient manner. In this work we will look at why this happens and propose a protocol designed to support partial replication under causal consistency more efficiently.","PeriodicalId":332245,"journal":{"name":"Proceedings of the First Workshop on Principles and Practice of Consistency for Distributed Data","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First Workshop on Principles and Practice of Consistency for Distributed Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2745947.2745953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

Abstract

Modern internet applications require scalability to millions of clients, response times in the tens of milliseconds, and availability in the presence of partitions, hardware faults and even disasters. To obtain these requirements, applications are usually geo-replicated across several data centres (DCs) spread throughout the world, providing clients with fast access to nearby DCs and fault-tolerance in case of a DC out-age. Using multiple replicas also has disadvantages, not only does this incur extra storage, bandwidth and hardware costs, but programming these systems becomes more difficult. To address the additional hardware costs, data is often partially replicated, meaning that only certain DCs will keep a copy of certain data, for example in a key-value store it may only store values corresponding to a portion of the keys. Additionally, to address the issue of programming these systems, consistency protocols are run on top ensuring different guarantees for the data, but as shown by the CAP theorem, strong consistency, availability, and partition tolerance cannot be ensured at the same time. For many applications availability is paramout, thus strong consistency is exchanged for weaker consistencies allowing concurrent writes like causal consistency. Unfortunately these protocols are not designed with partial replication in mind and either end up not supporting it or do so in an inefficient manner. In this work we will look at why this happens and propose a protocol designed to support partial replication under causal consistency more efficiently.

查看原文本刊更多论文

为地理分布的部分复制设计因果一致的协议

现代互联网应用程序需要可伸缩性到数以百万计的客户机，响应时间在几十毫秒内，并且在存在分区、硬件故障甚至灾难的情况下可用性。为了获得这些需求，应用程序通常跨分布在世界各地的几个数据中心(DC)进行地理复制，为客户提供对附近DC的快速访问和在DC停机时的容错能力。使用多个副本也有缺点，这不仅会产生额外的存储、带宽和硬件成本，而且对这些系统进行编程也变得更加困难。为了解决额外的硬件成本，数据通常是部分复制的，这意味着只有某些dc会保留某些数据的副本，例如，在键值存储中，它可能只存储与键的一部分对应的值。此外，为了解决这些系统的编程问题，一致性协议在上面运行，以确保对数据的不同保证，但正如CAP定理所示，不能同时确保强一致性、可用性和分区容忍度。对于许多应用程序来说，可用性是最重要的，因此强一致性被交换为允许并发写的较弱一致性，比如因果一致性。不幸的是，这些协议在设计时并没有考虑到部分复制，要么最终不支持部分复制，要么以低效的方式支持部分复制。在这项工作中，我们将研究为什么会发生这种情况，并提出一个旨在更有效地支持因果一致性下的部分复制的协议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the First Workshop on Principles and Practice of Consistency for Distributed Data

自引率

0.00%

发文量