Reliable State Machines: A Framework for Programming Reliable Cloud Services

Suvam Mukherjee, N. Raj, Krishnan Govindraj, Pantazis Deligiannis, Chandramouleswaran Ravichandran, A. Lal, Aseem Rastogi, R. Krishnaswamy
{"title":"Reliable State Machines: A Framework for Programming Reliable Cloud Services","authors":"Suvam Mukherjee, N. Raj, Krishnan Govindraj, Pantazis Deligiannis, Chandramouleswaran Ravichandran, A. Lal, Aseem Rastogi, R. Krishnaswamy","doi":"10.4230/LIPIcs.ECOOP.2019.18","DOIUrl":null,"url":null,"abstract":"Building reliable applications for the cloud is challenging because of unpredictable failures during a program's execution. This paper presents a programming framework called Reliable State Machines (RSMs), that offers fault-tolerance by construction. Using our framework, a programmer can build an application as several (possibly distributed) RSMs that communicate with each other via messages, much in the style of actor-based programming. Each RSM is additionally fault-tolerant by design and offers the illusion of being \"always-alive\". An RSM is guaranteed to process each input request exactly once, as one would expect in a failure-free environment. The RSM runtime automatically takes care of persisting state and rehydrating it on a failover. We present the core syntax and semantics of RSMs, along with a formal proof of failure-transparency. We provide an implementation of the RSM framework and runtime on the .NET platform for deploying services to Microsoft Azure. We carried out an extensive performance evaluation on micro-benchmarks to show that one can build high-throughput applications with RSMs. We also present a case study where we rewrote a significant part of a production cloud service using RSMs. The resulting service has simpler code and exhibits production-grade performance.","PeriodicalId":172012,"journal":{"name":"European Conference on Object-Oriented Programming","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Conference on Object-Oriented Programming","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ECOOP.2019.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Building reliable applications for the cloud is challenging because of unpredictable failures during a program's execution. This paper presents a programming framework called Reliable State Machines (RSMs), that offers fault-tolerance by construction. Using our framework, a programmer can build an application as several (possibly distributed) RSMs that communicate with each other via messages, much in the style of actor-based programming. Each RSM is additionally fault-tolerant by design and offers the illusion of being "always-alive". An RSM is guaranteed to process each input request exactly once, as one would expect in a failure-free environment. The RSM runtime automatically takes care of persisting state and rehydrating it on a failover. We present the core syntax and semantics of RSMs, along with a formal proof of failure-transparency. We provide an implementation of the RSM framework and runtime on the .NET platform for deploying services to Microsoft Azure. We carried out an extensive performance evaluation on micro-benchmarks to show that one can build high-throughput applications with RSMs. We also present a case study where we rewrote a significant part of a production cloud service using RSMs. The resulting service has simpler code and exhibits production-grade performance.
可靠状态机:编写可靠云服务的框架
为云构建可靠的应用程序是具有挑战性的,因为在程序执行过程中会出现不可预测的故障。本文提出了一种通过构造提供容错的编程框架——可靠状态机(RSMs)。使用我们的框架,程序员可以将应用程序构建为几个(可能是分布式的)rsm,它们通过消息相互通信,这在很大程度上是基于角色的编程风格。每个RSM在设计上都具有容错性,并提供了一种“永远活着”的错觉。RSM保证每个输入请求只处理一次,正如在无故障环境中所期望的那样。RSM运行时自动处理持久化状态,并在故障转移时对其进行补充。我们给出了rsm的核心语法和语义,以及故障透明性的正式证明。我们在。net平台上提供了RSM框架和运行时的实现,用于将服务部署到Microsoft Azure。我们在微基准测试上进行了广泛的性能评估,以证明可以使用rsm构建高吞吐量应用程序。我们还提供了一个案例研究,其中我们使用rsm重写了生产云服务的重要部分。生成的服务具有更简单的代码,并具有生产级的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信