A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms

I. Konnov, M. Lazic, H. Veith, Josef Widder
{"title":"A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms","authors":"I. Konnov, M. Lazic, H. Veith, Josef Widder","doi":"10.1145/3009837.3009860","DOIUrl":null,"url":null,"abstract":"Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards. We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.","PeriodicalId":20657,"journal":{"name":"Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"66","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3009837.3009860","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 66

Abstract

Distributed algorithms have many mission-critical applications ranging from embedded systems and replicated databases to cloud computing. Due to asynchronous communication, process faults, or network failures, these algorithms are difficult to design and verify. Many algorithms achieve fault tolerance by using threshold guards that, for instance, ensure that a process waits until it has received an acknowledgment from a majority of its peers. Consequently, domain-specific languages for fault-tolerant distributed systems offer language support for threshold guards. We introduce an automated method for model checking of safety and liveness of threshold-guarded distributed algorithms in systems where the number of processes and the fraction of faulty processes are parameters. Our method is based on a short counterexample property: if a distributed algorithm violates a temporal specification (in a fragment of LTL), then there is a counterexample whose length is bounded and independent of the parameters. We prove this property by (i) characterizing executions depending on the structure of the temporal formula, and (ii) using commutativity of transitions to accelerate and shorten executions. We extended the ByMC toolset (Byzantine Model Checker) with our technique, and verified liveness and safety of 10 prominent fault-tolerant distributed algorithms, most of which were out of reach for existing techniques.
一个关于容错分布式算法安全性和活动性验证的简短反例
分布式算法有许多关键任务应用,从嵌入式系统和复制数据库到云计算。由于异步通信、进程故障或网络故障,这些算法难以设计和验证。许多算法通过使用阈值保护来实现容错,例如,确保进程等待,直到它收到来自大多数对等节点的确认。因此,用于容错分布式系统的领域特定语言为阈值保护提供了语言支持。在以进程数量和故障进程比例为参数的系统中,我们介绍了一种用于阈值保护分布式算法的安全性和活动性模型检查的自动化方法。我们的方法基于一个简短的反例属性:如果一个分布式算法违反了时间规范(在LTL的片段中),那么就存在一个反例,其长度是有界的,与参数无关。我们通过(i)根据时间公式的结构来表征执行,以及(ii)使用转换的交换性来加速和缩短执行来证明这一性质。我们用我们的技术扩展了ByMC工具集(Byzantine Model Checker),并验证了10个突出的容错分布式算法的活动性和安全性,其中大多数是现有技术无法达到的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信