Canonical Workflows in Simulation-based Climate Sciences

IF 1.3 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Data Intelligence Pub Date : 2022-03-07 DOI:10.1162/dint_a_00127

I. Anders, Karsten Peters-von Gehlen, H. Thiemann

{"title":"Canonical Workflows in Simulation-based Climate Sciences","authors":"I. Anders, Karsten Peters-von Gehlen, H. Thiemann","doi":"10.1162/dint_a_00127","DOIUrl":null,"url":null,"abstract":"Abstract In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulation-based research. We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center (DKRZ). What is special about this is that the DKRZ provides the climate science community with resources like high performance computing (HPC), data storage and specialised services, and hosts the World Data Center for Climate (WDCC). Therefore, users can perform their entire research workflows up to the publication of the data on the same infrastructure. Our analysis shows, that the resources are used by two primary user types: those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse, build-on and analyse existing data. We then further subdivided these top-level user categories based on their specific goals and analysed their typical, idealised workflows applied to achieve the respective project goals. We find that due to the subdivision and further granulation of the user groups, the workflows show apparent differences. Nevertheless, similar “Canonical Workflow Modules” can be clearly made out. These modules are “Data and Software (Re)use”, “Compute”, “Data and Software Storing”, “Data and Software Publication”, “Generating Knowledge” and in their entirety form the basis for a Canonical Workflow Framework for Research (CWFR). It is desirable that parts of the workflows in a CWFR act as FDOs, but we view this aspect critically. Also, we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"212-225"},"PeriodicalIF":1.3000,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/dint_a_00127","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 2

Abstract

Abstract In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulation-based research. We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center (DKRZ). What is special about this is that the DKRZ provides the climate science community with resources like high performance computing (HPC), data storage and specialised services, and hosts the World Data Center for Climate (WDCC). Therefore, users can perform their entire research workflows up to the publication of the data on the same infrastructure. Our analysis shows, that the resources are used by two primary user types: those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse, build-on and analyse existing data. We then further subdivided these top-level user categories based on their specific goals and analysed their typical, idealised workflows applied to achieve the respective project goals. We find that due to the subdivision and further granulation of the user groups, the workflows show apparent differences. Nevertheless, similar “Canonical Workflow Modules” can be clearly made out. These modules are “Data and Software (Re)use”, “Compute”, “Data and Software Storing”, “Data and Software Publication”, “Generating Knowledge” and in their entirety form the basis for a Canonical Workflow Framework for Research (CWFR). It is desirable that parts of the workflows in a CWFR act as FDOs, but we view this aspect critically. Also, we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.

查看原文本刊更多论文

基于模拟的气候科学中的规范工作流程

在本文中，我们从基于模拟的气候科学的当前工作流程中提出了规范工作流模块的推导，以支持基于模拟的研究的相应框架的阐述。我们首先根据他们使用德国气候计算中心(DKRZ)提供的资源的原因，确定了基于模拟的气候科学的不同用户和用户组。特别之处在于，DKRZ为气候科学界提供高性能计算(HPC)、数据存储和专业服务等资源，并托管世界气候数据中心(WDCC)。因此，用户可以在相同的基础设施上执行他们的整个研究工作流程，直到发布数据。我们的分析表明，这些资源主要由两种用户类型使用:那些需要高性能计算系统执行资源密集型模拟以随后分析它们的用户，以及那些重用、构建和分析现有数据的用户。然后，我们根据他们的具体目标进一步细分这些顶级用户类别，并分析他们用于实现各自项目目标的典型的、理想化的工作流程。我们发现，由于用户组的细分和进一步粒度化，工作流显示出明显的差异。然而，类似的“规范工作流模块”可以清晰地辨认出来。这些模块是“数据和软件(再)使用”、“计算”、“数据和软件存储”、“数据和软件发布”、“生成知识”，它们的整体构成了研究规范工作流框架(CWFR)的基础。在CWFR中，工作流的某些部分充当fdo是可取的，但是我们严格地看待这方面。此外，我们还思考了一个问题，即从当前用户行为分析中推导出的规范化工作流模块是否仍然适用于未来的系统和工作流程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊