{"title":"Nicaea: A Byzantine Fault Tolerant Consensus Under Unpredictable Message Delivery Failures for Parallel and Distributed Computing","authors":"Guanlin Jing;Yifei Zou;Minghui Xu;Yanqiang Zhang;Dongxiao Yu;Zhiguang Shan;Xiuzhen Cheng;Rajiv Ranjan","doi":"10.1109/TC.2024.3506856","DOIUrl":null,"url":null,"abstract":"Byzantine fault-tolerant (BFT) consensus is a critical problem in parallel and distributed computing systems, particularly with potential adversaries. Most prior work on BFT consensus assumes reliable message delivery and tolerates arbitrary failures of up to <inline-formula><tex-math>$\\frac{n}{3}$</tex-math></inline-formula> nodes out of <inline-formula><tex-math>$n$</tex-math></inline-formula> total nodes. However, many systems face unpredictable message delivery failures. This paper investigates the impact of unpredictable message delivery failures on the BFT consensus problem. We propose Nicaea, a novel protocol enabling consensus among loyal nodes when the number of Byzantine nodes is below a new threshold, given by: <inline-formula><tex-math>$\\frac{\\left(2-\\rho\\right)\\left(1-\\rho\\right)^{2n-2}-1}{\\left(2-\\rho\\right) \\left(1-\\rho\\right)^{2n-2}+1}n$</tex-math></inline-formula>, where <inline-formula><tex-math>$\\rho$</tex-math></inline-formula> denotes the message failure rate. Theoretical proofs and experimental results validate Nicaea's Byzantine resilience. Our findings reveal a fundamental trade-off: as message delivery instability increases, a system's tolerance to Byzantine failures decreases. The well-known <inline-formula><tex-math>$\\frac{n}{3}$</tex-math></inline-formula> threshold under reliable message delivery is a special case of our generalized threshold when <inline-formula><tex-math>$\\rho=0$</tex-math></inline-formula>. To the best of our knowledge, this work presents the first quantitative characterization of unpredictable message delivery failures’ impact on Byzantine fault tolerance in parallel and distributed computing.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 3","pages":"915-928"},"PeriodicalIF":3.6000,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10770195/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Byzantine fault-tolerant (BFT) consensus is a critical problem in parallel and distributed computing systems, particularly with potential adversaries. Most prior work on BFT consensus assumes reliable message delivery and tolerates arbitrary failures of up to $\frac{n}{3}$ nodes out of $n$ total nodes. However, many systems face unpredictable message delivery failures. This paper investigates the impact of unpredictable message delivery failures on the BFT consensus problem. We propose Nicaea, a novel protocol enabling consensus among loyal nodes when the number of Byzantine nodes is below a new threshold, given by: $\frac{\left(2-\rho\right)\left(1-\rho\right)^{2n-2}-1}{\left(2-\rho\right) \left(1-\rho\right)^{2n-2}+1}n$, where $\rho$ denotes the message failure rate. Theoretical proofs and experimental results validate Nicaea's Byzantine resilience. Our findings reveal a fundamental trade-off: as message delivery instability increases, a system's tolerance to Byzantine failures decreases. The well-known $\frac{n}{3}$ threshold under reliable message delivery is a special case of our generalized threshold when $\rho=0$. To the best of our knowledge, this work presents the first quantitative characterization of unpredictable message delivery failures’ impact on Byzantine fault tolerance in parallel and distributed computing.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.