OSIRIS-SR: a scalable yet reliable distributed workflow execution engine

SWEET '13 Pub Date : 2013-06-23 DOI:10.1145/2499896.2499899
Nenad Stojnic, H. Schuldt
{"title":"OSIRIS-SR: a scalable yet reliable distributed workflow execution engine","authors":"Nenad Stojnic, H. Schuldt","doi":"10.1145/2499896.2499899","DOIUrl":null,"url":null,"abstract":"Workflows provide an easy to use programming model for the construction of complex services that are (recursively) composed of simpler services. When it comes to high performance workflow execution, the distribution (outscaling) of the constituent services of the workflow across an environment of computational nodes is a key concept and also a very straightforward advantage of the workflow paradigm. However, scalable workflow execution cannot only be provided by the distribution of services but also necessitates novel architectures for the workflow engine in charge of service orchestration. Even though workflow orchestration is commonly provided by centralized solutions, these architectures imply performance bottlenecks and single points of failure. Hence, the workflow engine has to be distributed as well, by efficiently replicating workflow metadata across several nodes in a network. A particular challenge stems from the requirement of providing scalable workflow execution that is at the same time also reliable. In this paper, we present OSIRIS-SR, a decentralized middleware for the distributed execution of workflows. It has particularly been designed to jointly provide a high degree of scalability and reliability. OSIRIS-SR locally leverages the concurrent and redundant Actor model for workflow processing, whereas globally OSIRIS-SR runs a number of scalable system services for the management of workflow metadata, with the Safety Ring being the most prominent one. The Safety Ring service features a self-healing node overlay for the purpose of active workflow instance supervision that serves at the same time as a scalable and reliable metadata storage. We discuss in detail the Safety Ring architecture and the mechanics behind the scalable and reliable workflow management in OSIRIS-SR. The evaluation results of OSIRIS-SR show that support for reliable workflow execution does not significantly impact the system's scalability characteristics.","PeriodicalId":198333,"journal":{"name":"SWEET '13","volume":"25 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SWEET '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2499896.2499899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Workflows provide an easy to use programming model for the construction of complex services that are (recursively) composed of simpler services. When it comes to high performance workflow execution, the distribution (outscaling) of the constituent services of the workflow across an environment of computational nodes is a key concept and also a very straightforward advantage of the workflow paradigm. However, scalable workflow execution cannot only be provided by the distribution of services but also necessitates novel architectures for the workflow engine in charge of service orchestration. Even though workflow orchestration is commonly provided by centralized solutions, these architectures imply performance bottlenecks and single points of failure. Hence, the workflow engine has to be distributed as well, by efficiently replicating workflow metadata across several nodes in a network. A particular challenge stems from the requirement of providing scalable workflow execution that is at the same time also reliable. In this paper, we present OSIRIS-SR, a decentralized middleware for the distributed execution of workflows. It has particularly been designed to jointly provide a high degree of scalability and reliability. OSIRIS-SR locally leverages the concurrent and redundant Actor model for workflow processing, whereas globally OSIRIS-SR runs a number of scalable system services for the management of workflow metadata, with the Safety Ring being the most prominent one. The Safety Ring service features a self-healing node overlay for the purpose of active workflow instance supervision that serves at the same time as a scalable and reliable metadata storage. We discuss in detail the Safety Ring architecture and the mechanics behind the scalable and reliable workflow management in OSIRIS-SR. The evaluation results of OSIRIS-SR show that support for reliable workflow execution does not significantly impact the system's scalability characteristics.
OSIRIS-SR:一个可扩展且可靠的分布式工作流执行引擎
工作流为构造由更简单的服务(递归地)组成的复杂服务提供了一个易于使用的编程模型。当涉及到高性能工作流执行时,工作流的组成服务在计算节点环境中的分布(扩展)是一个关键概念,也是工作流范式的一个非常直接的优势。然而,可伸缩的工作流执行不仅可以通过服务的分布来提供,还需要为负责服务编排的工作流引擎提供新颖的体系结构。尽管工作流编排通常由集中式解决方案提供,但这些体系结构意味着性能瓶颈和单点故障。因此,工作流引擎也必须是分布式的,通过在网络中的多个节点之间高效地复制工作流元数据。一个特殊的挑战来自于提供可伸缩的工作流执行,同时又要可靠。在本文中,我们提出了OSIRIS-SR,一个用于分布式执行工作流的分散中间件。它特别设计为共同提供高度的可伸缩性和可靠性。OSIRIS-SR在本地利用并发和冗余Actor模型进行工作流处理,而全局OSIRIS-SR运行许多可扩展的系统服务来管理工作流元数据,其中最突出的是Safety Ring。Safety Ring服务具有自修复节点覆盖功能,用于活动工作流实例监督,同时作为可扩展和可靠的元数据存储。我们详细讨论了安全环架构和OSIRIS-SR中可扩展和可靠的工作流管理背后的机制。OSIRIS-SR的评估结果表明,支持可靠的工作流执行不会显著影响系统的可扩展性特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信