Kepler: an extensible system for design and execution of scientific workflows

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI:10.1109/SSDBM.2004.44

I. Altintas, Chad Berkley, Efrat Jaeger, Matthew B. Jones, Bertram Ludäscher, S. Mock

{"title":"Kepler: an extensible system for design and execution of scientific workflows","authors":"I. Altintas, Chad Berkley, Efrat Jaeger, Matthew B. Jones, Bertram Ludäscher, S. Mock","doi":"10.1109/SSDBM.2004.44","DOIUrl":null,"url":null,"abstract":"Most scientists conduct analyses and run models in several different software and hardware environments, mentally coordinating the export and import of data from one environment to another. The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs). SWFs are a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results. Kepler attempts to streamline the workflow creation and execution process so that scientists can design, execute, monitor, re-run, and communicate analytical procedures repeatedly with minimal effort. Kepler is unique in that it seamlessly combines high-level workflow design with execution and runtime interaction, access to local and remote data, and local and remote service invocation. SWFs are superficially similar to business process workflows but have several challenges not present in the business workflow scenario. For example, they often operate on large, complex and heterogeneous data, can be computationally intensive and produce complex derived data products that may be archived for use in reparameterized runs or other workflows. Moreover, unlike business workflows, SWFs are often dataflow-oriented as witnessed by a number of recent academic systems (e.g., DiscoveryNet, Taverna and Triana) and commercial systems (Scitegic/Pipeline-Pilot, Inforsense). In a sense, SWFs are often closer to signal-processing and data streaming applications than they are to control-oriented business workflow applications.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1023","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2004.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1023

Abstract

Most scientists conduct analyses and run models in several different software and hardware environments, mentally coordinating the export and import of data from one environment to another. The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs). SWFs are a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results. Kepler attempts to streamline the workflow creation and execution process so that scientists can design, execute, monitor, re-run, and communicate analytical procedures repeatedly with minimal effort. Kepler is unique in that it seamlessly combines high-level workflow design with execution and runtime interaction, access to local and remote data, and local and remote service invocation. SWFs are superficially similar to business process workflows but have several challenges not present in the business workflow scenario. For example, they often operate on large, complex and heterogeneous data, can be computationally intensive and produce complex derived data products that may be archived for use in reparameterized runs or other workflows. Moreover, unlike business workflows, SWFs are often dataflow-oriented as witnessed by a number of recent academic systems (e.g., DiscoveryNet, Taverna and Triana) and commercial systems (Scitegic/Pipeline-Pilot, Inforsense). In a sense, SWFs are often closer to signal-processing and data streaming applications than they are to control-oriented business workflow applications.

查看原文本刊更多论文

开普勒:一个可扩展的系统，用于设计和执行科学工作流程

大多数科学家在几个不同的软件和硬件环境中进行分析和运行模型，在心理上协调从一个环境到另一个环境的数据导出和导入。Kepler科学工作流系统为领域科学家提供了一个易于使用但功能强大的系统，用于捕获科学工作流(swf)。主权财富基金是科学家从原始数据到可发表的结果可能要经历的特别过程的形式化。开普勒试图简化工作流程的创建和执行过程，以便科学家能够以最小的努力重复设计，执行，监控，重新运行和沟通分析程序。Kepler的独特之处在于，它将高级工作流设计与执行和运行时交互、本地和远程数据访问以及本地和远程服务调用无缝地结合在一起。swf表面上类似于业务流程工作流，但存在业务工作流场景中不存在的几个挑战。例如，它们通常操作大型、复杂和异构的数据，可能是计算密集型的，并产生复杂的派生数据产品，这些产品可能被存档，以便在重新参数化运行或其他工作流中使用。此外，与业务工作流不同，主权财富基金通常是面向数据流的，最近的许多学术系统(例如DiscoveryNet、Taverna和Triana)和商业系统(scientific /Pipeline-Pilot、Inforsense)都证明了这一点。从某种意义上说，swf通常更接近于信号处理和数据流应用程序，而不是面向控制的业务工作流应用程序。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.

自引率

0.00%

发文量