Sebastian Pohl, Nourhan Elfaramawy, Kedi Cao, Birte Kehr, Matthias Weidlich
{"title":"How do users design scientific workflows? The Case of Snakemake","authors":"Sebastian Pohl, Nourhan Elfaramawy, Kedi Cao, Birte Kehr, Matthias Weidlich","doi":"arxiv-2309.14097","DOIUrl":null,"url":null,"abstract":"Scientific workflows automate the analysis of large-scale scientific data,\nfostering the reuse of data processing operators as well as the reproducibility\nand traceability of analysis results. In exploratory research, however,\nworkflows are continuously adapted, utilizing a wide range of tools and\nsoftware libraries, to test scientific hypotheses. Script-based workflow\nengines cater to the required flexibility through direct integration of\nprogramming primitives but lack abstractions for interactive exploration of the\nworkflow design by a user during workflow execution. To derive requirements for\nsuch interactive workflows, we conduct an empirical study on the use of\nSnakemake, a popular Python-based workflow engine. Based on workflows collected\nfrom 1602 GitHub repositories, we present insights on common structures of\nSnakemake workflows, as well as the language features typically adopted in\ntheir specification.","PeriodicalId":501310,"journal":{"name":"arXiv - CS - Other Computer Science","volume":"48 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Other Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2309.14097","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Scientific workflows automate the analysis of large-scale scientific data,
fostering the reuse of data processing operators as well as the reproducibility
and traceability of analysis results. In exploratory research, however,
workflows are continuously adapted, utilizing a wide range of tools and
software libraries, to test scientific hypotheses. Script-based workflow
engines cater to the required flexibility through direct integration of
programming primitives but lack abstractions for interactive exploration of the
workflow design by a user during workflow execution. To derive requirements for
such interactive workflows, we conduct an empirical study on the use of
Snakemake, a popular Python-based workflow engine. Based on workflows collected
from 1602 GitHub repositories, we present insights on common structures of
Snakemake workflows, as well as the language features typically adopted in
their specification.