{"title":"A Framework for Collecting Provenance in Data-Centric Scientific Workflows","authors":"Yogesh L. Simmhan, Beth Plale, Dennis Gannon","doi":"10.1109/ICWS.2006.5","DOIUrl":null,"url":null,"abstract":"The increasing ability for the Earth sciences to sense the world around us is resulting in a growing need for data-driven applications that are under the control of data-centric workflows composed of grid- and Web-services. The focus of our work is on provenance collection/or these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products)","PeriodicalId":408032,"journal":{"name":"2006 IEEE International Conference on Web Services (ICWS'06)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"130","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 IEEE International Conference on Web Services (ICWS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWS.2006.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 130
Abstract
The increasing ability for the Earth sciences to sense the world around us is resulting in a growing need for data-driven applications that are under the control of data-centric workflows composed of grid- and Web-services. The focus of our work is on provenance collection/or these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1% for an eight service workflow using 271 data products)