Criteria for the Evaluation of Workflow Management Systems for Scientific Data Analysis

Aleyna Dilan Kiran, Mehmet Can Ay, J. Allmer
{"title":"Criteria for the Evaluation of Workflow Management Systems for Scientific Data Analysis","authors":"Aleyna Dilan Kiran, Mehmet Can Ay, J. Allmer","doi":"10.26502/jbsb.5107055","DOIUrl":null,"url":null,"abstract":"Many scientific endeavors, such as molecular biology, have become dependent on big data and its analysis. For example, precision medicine depends on molecular measurements and data analysis per patient. Data analyses supporting medical decisions must be standardized and performed consistently across patients. While perhaps not life-threatening, data analyses in basic research have become increasingly complex. RNA-seq data, for example, entails a multi-step analysis ranging from quality assessment of the measurements to statistical analyses. Workflow management systems (WFMS) enable the development of data analysis workflows (WF), their reproduction, and their application to datasets of the same type. However, far more than a hundred WFMS are available, and there is no way to convert data analysis WFs among WFMS. Therefore, the initial choice of a WFMS is important as it entails a lock-in to the system. The reach in their particular field (number of citations) can be used as a proxy for selecting a WFMS, but of the about 25 WFMS we mention in this work, at least 5 have a large reach in scientific data analysis. Hence other criteria are needed to delineate among WFMS. By extracting such criteria from selected studies concerning WFMS and adding additional criteria, we arrived at five critical criteria: reproducibility, reusability, FAIRness, versioning support, and security. Another five criteria (providing a graphical user interface, WF flexibility, WF scalability, WF shareability, and computational transparency) we deemed important but not critical for the assessment of WFMS. We applied the criteria to the most cited WFMS in PubMed and found none that support all criteria. We hope that suggesting these criteria will spark a discussion on what features are important for WFMS in scientific data analysis and may lead to developing WFMS that fulfill such criteria.","PeriodicalId":73617,"journal":{"name":"Journal of bioinformatics and systems biology : Open access","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioinformatics and systems biology : Open access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26502/jbsb.5107055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Many scientific endeavors, such as molecular biology, have become dependent on big data and its analysis. For example, precision medicine depends on molecular measurements and data analysis per patient. Data analyses supporting medical decisions must be standardized and performed consistently across patients. While perhaps not life-threatening, data analyses in basic research have become increasingly complex. RNA-seq data, for example, entails a multi-step analysis ranging from quality assessment of the measurements to statistical analyses. Workflow management systems (WFMS) enable the development of data analysis workflows (WF), their reproduction, and their application to datasets of the same type. However, far more than a hundred WFMS are available, and there is no way to convert data analysis WFs among WFMS. Therefore, the initial choice of a WFMS is important as it entails a lock-in to the system. The reach in their particular field (number of citations) can be used as a proxy for selecting a WFMS, but of the about 25 WFMS we mention in this work, at least 5 have a large reach in scientific data analysis. Hence other criteria are needed to delineate among WFMS. By extracting such criteria from selected studies concerning WFMS and adding additional criteria, we arrived at five critical criteria: reproducibility, reusability, FAIRness, versioning support, and security. Another five criteria (providing a graphical user interface, WF flexibility, WF scalability, WF shareability, and computational transparency) we deemed important but not critical for the assessment of WFMS. We applied the criteria to the most cited WFMS in PubMed and found none that support all criteria. We hope that suggesting these criteria will spark a discussion on what features are important for WFMS in scientific data analysis and may lead to developing WFMS that fulfill such criteria.
科学数据分析工作流程管理系统评价标准
许多科学研究,如分子生物学,都依赖于大数据及其分析。例如,精准医疗依赖于每位患者的分子测量和数据分析。支持医疗决策的数据分析必须标准化,并在患者之间一致执行。虽然可能不会危及生命,但基础研究中的数据分析已经变得越来越复杂。例如,RNA-seq数据需要从测量的质量评估到统计分析的多步骤分析。工作流管理系统(WFMS)支持数据分析工作流(WF)的开发、复制以及对同一类型数据集的应用。然而,可用的WFMS远远超过100个,并且没有办法在WFMS之间转换数据分析wf。因此,WFMS的初始选择很重要,因为它需要对系统进行锁定。在其特定领域的影响力(引用次数)可以用作选择WFMS的代理,但在我们在本工作中提到的大约25个WFMS中,至少有5个在科学数据分析方面具有很大的影响力。因此,需要其他标准来描述WFMS。通过从有关WFMS的选定研究中提取这些标准并添加其他标准,我们得出了五个关键标准:再现性、可重用性、公平性、版本支持和安全性。另外五个标准(提供图形用户界面、WF灵活性、WF可伸缩性、WF可共享性和计算透明性)我们认为对评估WFMS很重要,但不是关键。我们将这些标准应用到PubMed中被引用最多的WFMS中,发现没有一个符合所有标准。我们希望提出这些标准将引发关于WFMS在科学数据分析中哪些特征是重要的讨论,并可能导致开发满足这些标准的WFMS。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信