Linking multiple workflow provenance traces for interoperable collaborative science

The 5th Workshop on Workflows in Support of Large-Scale Science Pub Date : 2010-12-17 DOI:10.1109/WORKS.2010.5671861

P. Missier, Bertram Ludäscher, S. Bowers, Saumen C. Dey, A. Sarkar, B. Shrestha, I. Altintas, M. Anand, C. Goble

{"title":"Linking multiple workflow provenance traces for interoperable collaborative science","authors":"P. Missier, Bertram Ludäscher, S. Bowers, Saumen C. Dey, A. Sarkar, B. Shrestha, I. Altintas, M. Anand, C. Goble","doi":"10.1109/WORKS.2010.5671861","DOIUrl":null,"url":null,"abstract":"Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"55","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 5th Workshop on Workflows in Support of Large-Scale Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WORKS.2010.5671861","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 55

Abstract

Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.

查看原文本刊更多论文

为可互操作的协作科学链接多个工作流来源跟踪

科学合作越来越多地涉及不同群体之间的数据共享。我们考虑这样一个场景:科学工作流程的数据产品被发布，然后被其他研究人员用作其工作流程的输入。为了正确解释，共享数据必须由描述性元数据补充。我们专注于来源痕迹，这是此类元数据的一个主要示例，它描述了数据产品在计算工作流步骤方面的起源和处理历史。通过对已发布数据的重用，虚拟的、隐式的协作实验出现了，这使得人们希望将独立生成的跟踪组合成全局跟踪，将组合执行描述为单一的、无缝的实验。我们提出了一个来源共享模型，通过克服工作流系统、数据格式和来源模型的异构性所产生的各种互操作性问题，实现了这个整体视图。其核心是(i)一个抽象的工作流和来源模型，其中(ii)数据共享本身成为组合工作流的一部分。然后，我们描述了我们在地球数据观测网络(DataONE)项目背景下开发的模型的实现，该模型可以“拼接”来自不同开普勒和Taverna工作流程运行的痕迹。它为无缝的跨系统、协作的来源管理提供了一个原型框架，并且可以很容易地扩展到包括其他系统。我们的方法还为工作流互操作性的新方法打开了大门，不仅通过难以捉摸的工作流标准，还通过来自公共存储库的共享来源信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 5th Workshop on Workflows in Support of Large-Scale Science

自引率

0.00%

发文量