验证，然后信任:ZooKeeper中的数据不一致检测

Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data Pub Date : 2023-05-08 DOI:10.1145/3578358.3591328

Sushant Mane, Fang Lyu, B. Reed

{"title":"验证，然后信任:ZooKeeper中的数据不一致检测","authors":"Sushant Mane, Fang Lyu, B. Reed","doi":"10.1145/3578358.3591328","DOIUrl":null,"url":null,"abstract":"ZooKeeper masks crash failure of servers to provide a highly available, distributed coordination kernel; however, in production, not all failures are crash failures. Bugs in underlying software systems and hardware can corrupt the ZooKeeper replicas, leading to data loss. Since ZooKeeper is used as a 'source of truth' for mission-critical applications, it essential to detect data inconsistencies caused by arbitrary faults to safeguard reliability. Byzantine Fault Tolerance (BFT) promises to handle these problems. However, these protocols are expensive in important dimensions: development, deployment, complexity, and performance. ZooKeeper takes an alternative approach that focuses on detecting faulty behavior rather than tolerating it and thus providing improved reliability without paying the full expense of BFT protocols. This paper describes various techniques used for detecting data inconsistencies in ZooKeeper. We also analyzed the impact of using these techniques on the reliability and performance of the overall system. Our evaluation shows that a real-time digest-based fault detection technique can be employed in production to provide improved reliability with a minimal performance penalty and no additional operational cost. We hope that our analysis and evaluation can help guide the design of next-generation primary-backup systems aiming to provide high reliability.","PeriodicalId":198398,"journal":{"name":"Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Verify, And Then Trust: Data Inconsistency Detection in ZooKeeper\",\"authors\":\"Sushant Mane, Fang Lyu, B. Reed\",\"doi\":\"10.1145/3578358.3591328\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"ZooKeeper masks crash failure of servers to provide a highly available, distributed coordination kernel; however, in production, not all failures are crash failures. Bugs in underlying software systems and hardware can corrupt the ZooKeeper replicas, leading to data loss. Since ZooKeeper is used as a 'source of truth' for mission-critical applications, it essential to detect data inconsistencies caused by arbitrary faults to safeguard reliability. Byzantine Fault Tolerance (BFT) promises to handle these problems. However, these protocols are expensive in important dimensions: development, deployment, complexity, and performance. ZooKeeper takes an alternative approach that focuses on detecting faulty behavior rather than tolerating it and thus providing improved reliability without paying the full expense of BFT protocols. This paper describes various techniques used for detecting data inconsistencies in ZooKeeper. We also analyzed the impact of using these techniques on the reliability and performance of the overall system. Our evaluation shows that a real-time digest-based fault detection technique can be employed in production to provide improved reliability with a minimal performance penalty and no additional operational cost. We hope that our analysis and evaluation can help guide the design of next-generation primary-backup systems aiming to provide high reliability.\",\"PeriodicalId\":198398,\"journal\":{\"name\":\"Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3578358.3591328\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578358.3591328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

ZooKeeper屏蔽服务器崩溃故障，提供高可用性的分布式协调内核;然而，在生产环境中，并非所有的失败都是崩溃失败。底层软件系统和硬件中的错误可能会损坏ZooKeeper副本，导致数据丢失。由于ZooKeeper被用作关键任务应用程序的“真相来源”，因此检测由任意故障引起的数据不一致以保障可靠性至关重要。拜占庭容错(BFT)承诺处理这些问题。然而，这些协议在开发、部署、复杂性和性能等重要方面代价高昂。ZooKeeper采用了另一种方法，专注于检测错误行为，而不是容忍错误行为，从而提供更高的可靠性，而无需支付BFT协议的全部费用。本文介绍了在ZooKeeper中用于检测数据不一致性的各种技术。我们还分析了使用这些技术对整个系统的可靠性和性能的影响。我们的评估表明，基于摘要的实时故障检测技术可以在生产中使用，以最小的性能损失和没有额外的操作成本提供更高的可靠性。我们希望我们的分析和评估可以帮助指导旨在提供高可靠性的下一代主备系统的设计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Verify, And Then Trust: Data Inconsistency Detection in ZooKeeper

ZooKeeper masks crash failure of servers to provide a highly available, distributed coordination kernel; however, in production, not all failures are crash failures. Bugs in underlying software systems and hardware can corrupt the ZooKeeper replicas, leading to data loss. Since ZooKeeper is used as a 'source of truth' for mission-critical applications, it essential to detect data inconsistencies caused by arbitrary faults to safeguard reliability. Byzantine Fault Tolerance (BFT) promises to handle these problems. However, these protocols are expensive in important dimensions: development, deployment, complexity, and performance. ZooKeeper takes an alternative approach that focuses on detecting faulty behavior rather than tolerating it and thus providing improved reliability without paying the full expense of BFT protocols. This paper describes various techniques used for detecting data inconsistencies in ZooKeeper. We also analyzed the impact of using these techniques on the reliability and performance of the overall system. Our evaluation shows that a real-time digest-based fault detection technique can be employed in production to provide improved reliability with a minimal performance penalty and no additional operational cost. We hope that our analysis and evaluation can help guide the design of next-generation primary-backup systems aiming to provide high reliability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data

自引率

0.00%

发文量