A Framework for Collecting Provenance in Data-Centric Scientific Workflows

Yogesh L. Simmhan, Beth Plale, Dennis Gannon
{"title":"A Framework for Collecting Provenance in Data-Centric Scientific Workflows","authors":"Yogesh L. Simmhan, Beth Plale, Dennis Gannon","doi":"10.1109/ICWS.2006.5","DOIUrl":null,"url":null,"abstract":"The increasing ability for the Earth sciences to sense the world around us is resulting in a growing need for data-driven applications that are under the control of data-centric workflows composed of grid- and Web-services. The focus of our work is on provenance collection/or these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products)","PeriodicalId":408032,"journal":{"name":"2006 IEEE International Conference on Web Services (ICWS'06)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"130","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Web Services (ICWS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWS.2006.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 130

Abstract

The increasing ability for the Earth sciences to sense the world around us is resulting in a growing need for data-driven applications that are under the control of data-centric workflows composed of grid- and Web-services. The focus of our work is on provenance collection/or these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products)
以数据为中心的科学工作流中来源收集的框架
地球科学感知我们周围世界的能力日益增强,导致对数据驱动的应用程序的需求日益增长,这些应用程序由网格和web服务组成的以数据为中心的工作流控制。我们的工作重点是来源收集/或这些工作流,这对于验证工作流和确定生成的数据产品的质量是必要的。我们面临的挑战是记录统一的、可用的、满足领域需求的来源元数据,同时尽量减少服务作者的修改负担以及工作流引擎和服务的性能开销。该框架基于用于传播溯源活动的松耦合发布-订阅体系结构,满足详细溯源收集的需求,而原型的性能评估发现了最小的性能开销(使用271个数据产品的8个服务工作流在1%的范围内)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信