A terminology for scientific workflow systems

IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS
Frédéric Suter , Tainã Coleman , İlkay Altintaş , Rosa M. Badia , Bartosz Balis , Kyle Chard , Iacopo Colonnelli , Ewa Deelman , Paolo Di Tommaso , Thomas Fahringer , Carole Goble , Shantenu Jha , Daniel S. Katz , Johannes Köster , Ulf Leser , Kshitij Mehta , Hilary Oliver , J.-Luc Peterson , Giovanni Pizzi , Loïc Pottier , Rafael Ferreira da Silva
{"title":"A terminology for scientific workflow systems","authors":"Frédéric Suter ,&nbsp;Tainã Coleman ,&nbsp;İlkay Altintaş ,&nbsp;Rosa M. Badia ,&nbsp;Bartosz Balis ,&nbsp;Kyle Chard ,&nbsp;Iacopo Colonnelli ,&nbsp;Ewa Deelman ,&nbsp;Paolo Di Tommaso ,&nbsp;Thomas Fahringer ,&nbsp;Carole Goble ,&nbsp;Shantenu Jha ,&nbsp;Daniel S. Katz ,&nbsp;Johannes Köster ,&nbsp;Ulf Leser ,&nbsp;Kshitij Mehta ,&nbsp;Hilary Oliver ,&nbsp;J.-Luc Peterson ,&nbsp;Giovanni Pizzi ,&nbsp;Loïc Pottier ,&nbsp;Rafael Ferreira da Silva","doi":"10.1016/j.future.2025.107974","DOIUrl":null,"url":null,"abstract":"<div><div>The term “scientific workflow” has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow systems have been developed to manage and run these scientific workflows. However, no turnkey solution has emerged from the field to address the diversity of scientific processes and the infrastructure on which they are supposed to be implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new workflow system. A direct consequence of this situation is that many existing workflow management systems (WMSs) share some salient features, offer similar functionalities, and can manage the same categories of workflows but at the same time also have some distinct capabilities that can be important for specific applications. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the computing and storage resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow structure and characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"174 ","pages":"Article 107974"},"PeriodicalIF":6.2000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25002699","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

The term “scientific workflow” has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow systems have been developed to manage and run these scientific workflows. However, no turnkey solution has emerged from the field to address the diversity of scientific processes and the infrastructure on which they are supposed to be implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new workflow system. A direct consequence of this situation is that many existing workflow management systems (WMSs) share some salient features, offer similar functionalities, and can manage the same categories of workflows but at the same time also have some distinct capabilities that can be important for specific applications. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the computing and storage resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow structure and characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.
科学工作流系统的术语
在过去的二十年中,“科学工作流”这个术语已经发展到包含了相互依赖的计算任务和数据移动的广泛组合。它也成为现代科学应用中处理的总称。今天,许多科学应用程序可以被认为是由多个相互依赖的步骤组成的工作流,并且已经开发了数百个工作流系统来管理和运行这些科学工作流。然而,该领域还没有出现解决科学过程和基础设施多样性的解决方案。相反,新的研究问题需要执行具有一些新颖特征的科学工作流,这通常会导致开发一个全新的工作流系统。这种情况的一个直接后果是,许多现有的工作流管理系统(WMSs)共享一些显著的特性,提供类似的功能,并且可以管理相同类别的工作流,但同时也具有一些对特定应用程序很重要的不同功能。这种情况使得开发工作流的研究人员面临选择WMS的复杂问题。这种选择可以由技术考虑因素驱动,以找到最适合其应用程序以及可用的计算和存储资源的系统,或者其他因素,如声誉、采用、强大的社区支持或长期可持续性。为了解决这个问题,一组WMS开发人员和实践者共同努力创建了基于社区的WMS术语。本文总结了他们的发现,并介绍了这一新的术语来表征WMSs。该术语由五个方面组成:工作流结构和特征、组合、编排、数据管理和元数据捕获。每个轴包含几个概念,这些概念反映了WMSs的突出特征。在此基础上,本文还根据提出的轴和术语对现有的23个WMSs进行了分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
19.90
自引率
2.70%
发文量
376
审稿时长
10.6 months
期刊介绍: Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信