Qi-Shi Wu, Yi Gu, Xukang Lu, Mengxia Zhu, P. Brown, Wuyin Lin, Yangang Liu
{"title":"On optimization of scientific workflows to support streaming applications in distributed network environments","authors":"Qi-Shi Wu, Yi Gu, Xukang Lu, Mengxia Zhu, P. Brown, Wuyin Lin, Yangang Liu","doi":"10.1109/WORKS.2010.5671851","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671851","url":null,"abstract":"Large-scale data-intensive streaming applications in various science fields feature complex DAG-structured workflows comprised of distributed computing modules with intricate inter-module dependencies. Supporting such workflows in high-performance network environments and optimizing their throughput are crucial to collaborative scientific exploration and discovery. We formulate workflow mapping as a frame rate optimization problem and propose an efficient heuristic solution, which is integrated into the Condor-based Scientific Workflow Automation and Management Platform (SWAMP) in place of Condor's default mapping scheme. The SWAMP system is also augmented with several new components to improve the workflow management process. The performance superiority of the proposed solution is verified using both simulations and a real-life scientific workflow for climate modeling deployed in a distributed heterogeneous network environment.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127835497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Gerhards, A. Belloum, F. Berretz, V. Sander, S. Skorupa
{"title":"A history-tracing XML-based provenance framework for workflows","authors":"M. Gerhards, A. Belloum, F. Berretz, V. Sander, S. Skorupa","doi":"10.1109/WORKS.2010.5671873","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671873","url":null,"abstract":"The importance of validating and reproducing the outcome of computational processes is fundamental to many application domains. Assuring the provenance of workflows will likely become even more important with respect to the incorporation of human tasks to standard workflows by emerging standards such as WS-HumanTask. This paper addresses this trend by an actor-based workflow approach that actively support provenance. It proposes a framework to track and store provenance information automatically that applies for various workflow management systems. In particular, the introduced provenance framework supports the documentation of workflows in a legally binding way. The authors therefore use the concept of layered XML documents, i.e. history-tracing XML. Furthermore, the proposed provenance framework enables the executors (actors) of a particular workflow task to attest their operations and the associated results by integrating digital XML signatures.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116830369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Thrasher, Rory Carmichael, Peter Bui, Li Yu, D. Thain, S. Emrich
{"title":"Taming complex bioinformatics workflows with weaver, makeflow, and starch","authors":"A. Thrasher, Rory Carmichael, Peter Bui, Li Yu, D. Thain, S. Emrich","doi":"10.1109/WORKS.2010.5671858","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671858","url":null,"abstract":"In this paper we discuss challenges of common bioinformatics applications when deployed outside their initial development environments. We propose a three-tiered approach to mitigate some of these issues by leveraging an encapsulation tool, a high-level workflow language, and a portable intermediary. As a case study, we apply this approach to refactor a custom EST analysis pipeline. The Starch tool encapsulates program dependencies to simplify task specification and deployment. The Weaver language provides abstractions for distributed computing and naturally encourages code modularity. The Makeflow workflow engine provides a batch system agnostic engine to execute compiled Weaver code. To illustrate the benefits of our framework, we compare implementations, show their performance, and discuss benefits derived from our new workflow approach relative to traditional bioinformatics development.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130399089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiming Zhao, P. Grosso, R. Koning, J. van der Ham, C. de Laat
{"title":"Network resource selection for data transfer processes in scientific workflows","authors":"Zhiming Zhao, P. Grosso, R. Koning, J. van der Ham, C. de Laat","doi":"10.1109/WORKS.2010.5671840","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671840","url":null,"abstract":"Quality of the service (QoS) plays an important role in the life-cycle of scientific workflows for composing and executing applications. However, the quality of network services has so far rarely been considered in composing and executing scientific workflows. Currently, scientific applications tune the execution quality neglecting network resources, and by selecting only optimal software services and computing resources. One reason is that IP-based networks provide few possibilities for workflow systems to manage the service quality, and limit or prevent bandwidth reservation or network paths selection. We see nonetheless a strong need from scientific applications, and network operators, to include the network quality management in the workflow systems. In this paper, we discuss our ongoing research on this issue and present a semantic based solution to searching network resources with awareness of QoS requirements. The solution aims at complementing existing workflow systems on selecting network resources in the context of workflow composition, scheduling and execution when advanced network services are available. Our research is conducted in the context of the CineGrid project.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129944875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yogesh L. Simmhan, Emad Soroush, C. Ingen, Deb Agarwal, L. Ramakrishnan
{"title":"BReW: Blackbox resource selection for e-Science workflows","authors":"Yogesh L. Simmhan, Emad Soroush, C. Ingen, Deb Agarwal, L. Ramakrishnan","doi":"10.1109/WORKS.2010.5671857","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671857","url":null,"abstract":"Workflows are commonly used to model data intensive scientific analysis. As computational resource needs increase for eScience, emerging platforms like clouds present additional resource choices for scientists and policy makers. We introduce BReW, a tool enables users to make rapid, highlevel platform selection for their workflows using limited workflow knowledge. This helps make informed decisions on whether to port a workflow to a new platform. Our analysis of synthetic and real eScience workflows shows that using just total runtime length, maximum task fanout, and total data used and produced by the workflow, BReW can provide platform predictions comparable to whitebox models with detailed workflow knowledge.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115871523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The 5th Workshop on Workflows in Support of Large-Scale Science in conjunction with SC 10","authors":"E. Deelman, I. Taylor","doi":"10.1109/WORKS.2010.5671879","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671879","url":null,"abstract":"Scientific workflows are a key technology that enables large-scale computations and service management on distributed resources. Workflows enable scientists to design complex analysis that are composed of individual application components or services and often such components and services are designed, developed, and tested collaboratively.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128832771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Zinn, Q. Hart, Bertram Ludäscher, Yogesh L. Simmhan
{"title":"Streaming satellite data to cloud workflows for on-demand computing of environmental data products","authors":"Daniel Zinn, Q. Hart, Bertram Ludäscher, Yogesh L. Simmhan","doi":"10.1109/WORKS.2010.5671841","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671841","url":null,"abstract":"Environmental data arriving constantly from satellites and weather stations are used to compute weather coefficients that are essential for agriculture and viticulture. For example, the reference evapotranspiration (ET0) coefficient, overlaid on regional maps, is provided each day by the California Department of Water Resources to local farmers and turf managers to plan daily water use. Scaling out single-processor compute/data intensive applications operating on realtime data to support more users and higher-resolution data poses data engineering challenges. Cloud computing helps data providers expand resource capacity to meet growing needs besides supporting scientific needs like reprocessing historic data using new models. In this article, we examine migration of a legacy script used for daily ET0 computation by CIMIS to a workflow model that eases deployment to and scaling on the Windows Azure Cloud. Our architecture incorporates a direct streaming model into Cloud virtual machines (VMs) that improves the performance by 130% to 160% for our workflow over using Cloud storage for data staging, used commonly. The streaming workflows achieve runtimes comparable to desktop execution for single VMs and a linear speed-up when using multiple VMs, thus allowing computation of environmental coefficients at a much larger resolution than done presently.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130689336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Missier, Bertram Ludäscher, S. Bowers, Saumen C. Dey, A. Sarkar, B. Shrestha, I. Altintas, M. Anand, C. Goble
{"title":"Linking multiple workflow provenance traces for interoperable collaborative science","authors":"P. Missier, Bertram Ludäscher, S. Bowers, Saumen C. Dey, A. Sarkar, B. Shrestha, I. Altintas, M. Anand, C. Goble","doi":"10.1109/WORKS.2010.5671861","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671861","url":null,"abstract":"Scientific collaboration increasingly involves data sharing between separate groups. We consider a scenario where data products of scientific workflows are published and then used by other researchers as inputs to their workflows. For proper interpretation, shared data must be complemented by descriptive metadata. We focus on provenance traces, a prime example of such metadata which describes the genesis and processing history of data products in terms of the computational workflow steps. Through the reuse of published data, virtual, implicitly collaborative experiments emerge, making it desirable to compose the independently generated traces into global ones that describe the combined executions as single, seamless experiments. We present a model for provenance sharing that realizes this holistic view by overcoming the various interoperability problems that emerge from the heterogeneity of workflow systems, data formats, and provenance models. At the heart lie (i) an abstract workflow and provenance model in which (ii) data sharing becomes itself part of the combined workflow. We then describe an implementation of our model that we developed in the context of the Data Observation Network for Earth (DataONE) project and that can “stitch together” traces from different Kepler and Taverna workflow runs. It provides a prototypical framework for seamless cross-system, collaborative provenance management and can be easily extended to include other systems. Our approach also opens the door to new ways of workflow interoperability not only through often elusive workflow standards but through shared provenance information from public repositories.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124389241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Montagnat, T. Glatard, Damien Reimert, K. Maheshwari, E. Caron, F. Desprez
{"title":"Workflow-based comparison of two Distributed Computing Infrastructures","authors":"J. Montagnat, T. Glatard, Damien Reimert, K. Maheshwari, E. Caron, F. Desprez","doi":"10.1109/WORKS.2010.5671856","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671856","url":null,"abstract":"Porting applications to Distributed Computing Infrastructures (DCIs) is eased by the use of workflow abstractions. Yet, estimating the impact of the execution DCI on application performance is difficult due to the heterogeneity of the resources available, middleware and operation models. This paper describes a workflow-based experimental method to acquire objective performance comparison criterions when dealing with completely different DCIs. Experiments were conducted on the European EGI and the French Grid'5000 infrastructures to highlight raw performance variations and identify their causes. The results obtained also show that it is possible to conduct experiments on a production infrastructure with similar reproducibility as on an experimental platform.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134512543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gustavo Martínez, Gustavo Martínez, E. Heymann, Miguel Angel Senar, E. Luque, B. Miller
{"title":"Using SchedFlow for performance evaluation of workflow applications","authors":"Gustavo Martínez, Gustavo Martínez, E. Heymann, Miguel Angel Senar, E. Luque, B. Miller","doi":"10.1109/WORKS.2010.5671864","DOIUrl":"https://doi.org/10.1109/WORKS.2010.5671864","url":null,"abstract":"Computational science increasingly relies on the execution of workflows in distributed networks to solve complex applications. However, the heterogeneity of resources in these environments complicates resource management and the scheduling of such applications. Sophisticated scheduling policies are being developed for workflows, but they have had little impact in practice because their integration into existing workflow engines is complex and time consuming as each policy has to be individually ported to a particular workflow engine. In addition, choosing a particular scheduling policy is difficult, as factors like machine availability, workload, and communication volume between tasks are difficult to predict. In this paper, we describe SchedFlow, a tool that integrates scheduling policies into workflow engines such as Taverna, DAGMan or Karajan. We show how SchedFlow was used to take advantage of different scheduling policies at different times, depending on the dynamic workload of the workflows. Our experiments included two real workflow applications and four different scheduling policies. We show that no single scheduling policy is the best for all scenarios, so tools like SchedFlow can improve performance by providing flexibility when scheduling workflows.","PeriodicalId":400999,"journal":{"name":"The 5th Workshop on Workflows in Support of Large-Scale Science","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128854615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}