J. Deslippe, Abdelilah Essiari, S. Patton, T. Samak, C. Tull, A. Hexemer, G. Kumar, D. Parkinson, Polite Stewart
{"title":"Workflow Management for Real-Time Analysis of Lightsource Experiments","authors":"J. Deslippe, Abdelilah Essiari, S. Patton, T. Samak, C. Tull, A. Hexemer, G. Kumar, D. Parkinson, Polite Stewart","doi":"10.1109/WORKS.2014.9","DOIUrl":"https://doi.org/10.1109/WORKS.2014.9","url":null,"abstract":"The Advanced lightsource (ALS) is a X-ray synchrotron facility at Lawrence Berkeley National Laboratory. The ALS generates terabytes of raw and derived data each day and serves 1,000's of researchers each year. Only a subset of the data is analyzed due to barriers in terms of processing that small science teams are ill-equipped to surmount. In this paper, we discuss the development and application of a computational framework, termed SPOT, fed with synchrotron data, powered by storage, networking and compute resources at NERSC and ESnet. We describe issues and recommendations for an end-to-end analysis workflow for ALS data. After one year of operation, the collection contains over 90,000 datasets (550 TB) from 85 users across three beamlines. For 16 months, beamline data taken has been promptly and automatically analyzed and annotated with metadata, allowing users to focus on analysis, conclusions and experiments.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116160513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Gesing, M. Atkinson, Rosa Filgueira, I. Taylor, Andrew C. Jones, V. Stankovski, C. Liew, A. Spinuso, G. Terstyánszky, P. Kacsuk
{"title":"Workflows in a Dashboard: A New Generation of Usability","authors":"S. Gesing, M. Atkinson, Rosa Filgueira, I. Taylor, Andrew C. Jones, V. Stankovski, C. Liew, A. Spinuso, G. Terstyánszky, P. Kacsuk","doi":"10.1109/WORKS.2014.6","DOIUrl":"https://doi.org/10.1109/WORKS.2014.6","url":null,"abstract":"In the last 20 years quite a few mature workflow engines and workflow editors have been developed to support communities in managing workflows. While there is a trend followed by the providers of workflow engines to ease the creation of workflows tailored to their specific workflow system, the management tools still often necessitate much understanding of the workflow concepts and languages. This paper describes the approach targeting various workflow systems and building a single user interface for editing and monitoring workflows under consideration of aspects such as optimization and provenance of data. The design allots agile Web frameworks and novel technologies to build a workflow dashboard offered in a web browser and connecting seamlessly to available workflow systems and external resources like Cloud infrastructures. The user interface eliminates the need to become acquainted with diverse layouts. Thus, the usability is immensely increased for various aspects of managing workflows.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133155696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Workflow Ecosystems through Semantic and Standard Representations","authors":"D. Garijo, Y. Gil, Óscar Corcho","doi":"10.1109/WORKS.2014.13","DOIUrl":"https://doi.org/10.1109/WORKS.2014.13","url":null,"abstract":"Workflows are increasingly used to manage and share scientific computations and methods. Workflow tools can be used to design, validate, execute and visualize scientific workflows and their execution results. Other tools manage workflow libraries or mine their contents. There has been a lot of recent work on workflow system integration as well as common workflow interlinguas, but the interoperability among workflow systems remains a challenge. Ideally, these tools would form a workflow ecosystem such that it should be possible to create a workflow with a tool, execute it with another, visualize it with another, and use yet another tool to mine a repository of such workflows or their executions. In this paper, we describe our approach to create a workflow ecosystem through the use of standard models for provenance (OPM and W3C PROV) and extensions (P-PLAN and OPMW) to represent workflows. The ecosystem integrates different workflow tools with diverse functions (workflow generation, execution, browsing, mining, and visualization) created by a variety of research groups. This is, to our knowledge, the first time that such a variety of workflow systems and functions are integrated.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129509620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. R. Balderrama, Matthieu Simonin, L. Ramakrishnan, V. Hendrix, C. Morin, D. Agarwal, Cédric Tedeschi
{"title":"Combining Workflow Templates with a Shared Space-Based Execution Model","authors":"J. R. Balderrama, Matthieu Simonin, L. Ramakrishnan, V. Hendrix, C. Morin, D. Agarwal, Cédric Tedeschi","doi":"10.1109/WORKS.2014.14","DOIUrl":"https://doi.org/10.1109/WORKS.2014.14","url":null,"abstract":"The growth for scientific data has led to data analysis being a critical step in the scientific process. The next generation scientific data analysis environment needs to address two challenges i) productivity of the end-user and ii) scalability of the workflows. The need to ensure both goals requires us to revisit the design and implementation of workflow tools. In this paper, we study the interaction of Tigres and HOCL-TS towards meeting these goals. Tigres and HOCL-TS have evolved separately; however their complementary foci allows us to study these issues in greater detail. We describe the pros and cons of an approach that integrates Tigres and HOCL-TS and HOCL-TS extension to support common non-functional requirements such as logging and monitoring that can be made available to the users through the Tigres API.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114789959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"User-Oriented Partial Result Evaluation in Workflow-Based Science Gateways","authors":"M. Jaghoori, Sarang Ramezani, S. Olabarriaga","doi":"10.1109/WORKS.2014.7","DOIUrl":"https://doi.org/10.1109/WORKS.2014.7","url":null,"abstract":"Scientific workflow management systems provide a useful layer for defining and executing applications supported by science gateways. In various optimization or simulation applications that need to run for a long time, the users may be satisfied with an incomplete execution. The system should, therefore, allow users to evaluate partial results of the workflow execution. This entails performing a consolidation step, that would normally run only at the end of the workflow. In this paper, we present two new workflow patterns that formally define how the consolidation step should be executed (on partial inputs) whenever the user proactively requests evaluation of the partial results. This changes the traditional workflow behavior, in which every step runs once, when all its data dependencies are satisfied. We evaluate implementing these patterns in various workflow management systems and finally present a DIRAC-based implementation of this feature for the use case of a molecular docking gateway.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116361638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sensitivity Analysis for Time Dependent Problems: Optimal Checkpoint-Recompute HPC Workflows","authors":"V. Carey, H. Abbasi, I. Rodero, H. Kolla","doi":"10.1109/WORKS.2014.15","DOIUrl":"https://doi.org/10.1109/WORKS.2014.15","url":null,"abstract":"Sensitivity analysis (SA) is a fundamental tool of uncertainty quantification(UQ). Adjoint-based SA is the optimal approach in many large-scale applications, such as the direct numerical simulation (DNS) of combustion. However, one of the challenges of the adjoint workflow for time-dependent applications is the storage and I/O requirements for the application state. During the time-reversal portion of the workflow, forward state is required in last-in-first-out order. The resulting requirements for storage at exascale are enormous. To mitigate this requirement, application state is regenerated from checkpoints over short windows of application time. This approach drastically reduces the total volume of stored data, allows the caching of state in the regeneration window in memory and on local SSDs, may accelerate the application execution by reducing output frequency, and reduces the power overhead from I/O. We explore variations to this workflow, applied to a proxy for the SA of turbulent combustion, by varying checkpoint number, state storage, and other regeneration options to find efficient implementations for minimizing compute time or power consumption.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115141330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Increasing Scientific Workflow Programming Productivity with HyperFlow","authors":"B. Baliś","doi":"10.1109/WORKS.2014.10","DOIUrl":"https://doi.org/10.1109/WORKS.2014.10","url":null,"abstract":"This paper presents HyperFlow: an approach to workflow programming which combines the advantages of a declarative workflow description and low-level scripting programming. The workflow execution model of HyperFlow is based on a formal model of computation - Process Networks. The execution environment is implemented on the basis of a widely adopted runtime platform node.js. Workflow programming benefits from such an approach in multiple ways, including leveraging a large programming ecosystem with many developers, reusable software packages and learning resources; elimination of shim nodes from the workflow graph; and increased reusability of workflow processing components. The HyperFlow workflow programming approach and its advanced capabilities are presented. The HyperFlow engine is also briefly described. Four example workflow applications from various domains, including flood threat assessment, are studied as a demonstration of the HyperFlow programming approach and a comparison with related solutions.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128899421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Execution Time Estimation for Workflow Scheduling","authors":"A. Chirkin, A. Belloum, S. Kovalchuk, M. Makkes","doi":"10.1109/WORKS.2014.11","DOIUrl":"https://doi.org/10.1109/WORKS.2014.11","url":null,"abstract":"Estimation of the execution time is an important part of the workflow scheduling problem. The aim of this paper is to highlight common problems in estimating the workflow execution time and propose a solution that takes into account the complexity and the randomness of the workflow components and their runtime. The solution proposed in this paper addresses the problems at different levels from task to workflow, including the error measurement and the theory behind the estimation algorithm. The proposed estimation algorithm can be integrated easily into a wide class of schedulers as a separate module. We use a dual stochastic representation, characteristic / distribution functions, in order to combine tasks' estimates into the overall workflow makespan. Additionally, we propose the workflow reductions - the operations on a workflow graph that do not decrease the accuracy of the estimates, but simplify the graph structure, hence increasing the performance of the algorithm.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129660536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Performance Model to Estimate Execution Time of Scientific Workflows on the Cloud","authors":"Ilia Pietri, G. Juve, E. Deelman, R. Sakellariou","doi":"10.1109/WORKS.2014.12","DOIUrl":"https://doi.org/10.1109/WORKS.2014.12","url":null,"abstract":"Scientific workflows, which capture large computational problems, may be executed on large-scale distributed systems such as Clouds. Determining the amount of resources to be provisioned for the execution of scientific workflows is a key component to achieve cost-efficient resource management and good performance. In this paper, a performance prediction model is presented to estimate execution time of scientific workflows for a different number of resources, taking into account their structure as well as their system-dependent characteristics. In the evaluation, three real-world scientific workflows are used to compare the estimated makespan calculated by the model with the actual makespan achieved on different system configurations of Amazon EC2. The results show that the proposed model can predict execution time with an error of less than 20% for over 96.8% of the experiments..","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122613885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Srinivasan, G. Juve, Rafael Ferreira da Silva, K. Vahi, E. Deelman
{"title":"A Cleanup Algorithm for Implementing Storage Constraints in Scientific Workflow Executions","authors":"S. Srinivasan, G. Juve, Rafael Ferreira da Silva, K. Vahi, E. Deelman","doi":"10.1109/WORKS.2014.8","DOIUrl":"https://doi.org/10.1109/WORKS.2014.8","url":null,"abstract":"Scientific workflows are often used to automate large-scale data analysis pipelines on clusters, grids, and clouds. However, because workflows can be extremely data-intensive, and are often executed on shared resources, it is critical to be able to limit or minimize the amount of disk space that workflows use on shared storage systems. This paper proposes a novel and simple approach that constrains the amount of storage space used by a workflow by inserting data cleanup tasks into the workflow task graph. Unlike previous solutions, the proposed approach provides guaranteed limits on disk usage, requires no new functionality in the underlying workflow scheduler, and does not require estimates of task runtimes. Experimental results show that this algorithm significantly reduces the number of cleanup tasks added to a workflow and yields better workflow makespans than the strategy currently used by the Pegasus Workflow Management System.","PeriodicalId":206005,"journal":{"name":"2014 9th Workshop on Workflows in Support of Large-Scale Science","volume":"251 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115019565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}