D. Ahn, Ned Bass, Albert Chu, J. Garlick, Mark Grondona, Stephen Herbein, Helgi I. Ingólfsson, Joseph Koning, Tapasya Patki, T. Scogland, B. Springmeyer, M. Taufer
{"title":"Flux: Overcoming Scheduling Challenges for Exascale Workflows","authors":"D. Ahn, Ned Bass, Albert Chu, J. Garlick, Mark Grondona, Stephen Herbein, Helgi I. Ingólfsson, Joseph Koning, Tapasya Patki, T. Scogland, B. Springmeyer, M. Taufer","doi":"10.1109/WORKS.2018.00007","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00007","url":null,"abstract":"Many emerging scientific workflows that target high-end HPC systems require complex interplay with the resource and job management software~(RJMS). However, portable, efficient and easy-to-use scheduling and execution of these workflows is still an unsolved problem. We present Flux, a novel, hierarchical RJMS infrastructure that addresses the key scheduling challenges of modern workflows in a scalable, easy-to-use, and portable manner. At the heart of Flux lies its ability to be nested seamlessly within batch allocations created by other schedulers as well as itself. Once a hierarchy of Flux instance is created within each allocation, its consistent and rich set of well-defined APIs portably and efficiently support those workflows that can often feature non-traditional execution patterns such as requirements for complex co-scheduling, massive ensembles of small jobs and coordination among jobs in an ensemble.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128995441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DagOn*: Executing Direct Acyclic Graphs as Parallel Jobs on Anything","authors":"R. Montella, D. Di Luccio, Sokol Kosta","doi":"10.1109/WORKS.2018.00012","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00012","url":null,"abstract":"The democratization of computational resources, thanks to the advent of public, private, and hybrid clouds, changed the rules in many science fields. For decades, one of the effort of computer scientists and computer engineers was the development of tools able to simplify access to high-end computational resources by computational scientists. However, nowadays any science field can be considered \"computational\" as the availability of powerful, but easy to manage workflow engines, is crucial. In this work, we present DagOn* (Direct acyclic graph On anything), a lightweight Python library implementing a workflow engine able to execute parallel jobs represented by direct acyclic graphs on any combination of local machines, on-premise high performance computing clusters, containers, and cloud-based virtual infrastructures. We use a real-world production-level application for weather and marine forecasts to illustrate the use of this new workflow engine.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117300695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Laurent Prosperi, Alexandru Costan, Pedro Silva, Gabriel Antoniu
{"title":"Planner: Cost-Efficient Execution Plans Placement for Uniform Stream Analytics on Edge and Cloud","authors":"Laurent Prosperi, Alexandru Costan, Pedro Silva, Gabriel Antoniu","doi":"10.1109/WORKS.2018.00010","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00010","url":null,"abstract":"Stream processing applications handle unbounded and continuous flows of data items which are generated from multiple geographically distributed sources. Two approaches are commonly used for processing: Cloud-based analytics and Edge analytics. The first one routes the whole data set to the Cloud, incurring significant costs and late results from the high latency networks that are traversed. The latter can give timely results but forces users to manually define which part of the computation should be executed on Edge and to interconnect it with the remaining part executed in the Cloud, leading to sub-optimal placements. In this paper, we introduce Planner, a middleware for uniform and transparent stream processing across Edge and Cloud. Planner automatically selects which parts of the execution graph will be executed at the Edge in order to minimize the network cost. Real-world micro-benchmarks show that Planner reduces the network usage by 40% and the makespan (end-to-end processing time) by 15% compared to state-of-the-art.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114574984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Casanova, Suraj Pandey, James Oeth, Ryan Tanaka, F. Suter, Rafael Ferreira da Silva
{"title":"WRENCH: A Framework for Simulating Workflow Management Systems","authors":"H. Casanova, Suraj Pandey, James Oeth, Ryan Tanaka, F. Suter, Rafael Ferreira da Silva","doi":"10.1109/WORKS.2018.00013","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00013","url":null,"abstract":"Scientific workflows are used routinely in numerous scientific domains, and Workflow Management Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed platforms. WMSs are complex software systems that interact with complex software infrastructures. Most WMS research and development activities rely on empirical experiments conducted with full-fledged software stacks on actual hardware platforms. Such experiments, however, are limited to hardware and software infrastructures at hand and can be labor- and/or time-intensive. As a result, relying solely on real-world experiments impedes WMS research and development. An alternative is to conduct experiments in simulation. In this work we present WRENCH, a WMS simulation framework, whose objectives are (i) accurate and scalable simulations; and (ii) easy simulation software development. WRENCH achieves its first objective by building on the SimGrid framework. While SimGrid is recognized for the accuracy and scalability of its simulation models, it only provides low-level simulation abstractions and thus large software development efforts are required when implementing simulators of complex systems. WRENCH thus achieves its second objective by providing high- level and directly re-usable simulation abstractions on top of SimGrid. After describing and giving rationales for WRENCH’s software architecture and APIs, we present a case study in which we apply WRENCH to simulate the Pegasus production WMS. We report on ease of implementation, simulation accuracy, and simulation scalability so as to determine to which extent WRENCH achieves its two above objectives. We also draw both qualitative and quantitative comparisons with a previously proposed workflow simulator.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134320929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thaylon Guedes, V. Silva, M. Mattoso, Marcos V. N. Bedo, Daniel de Oliveira
{"title":"A Practical Roadmap for Provenance Capture and Data Analysis in Spark-Based Scientific Workflows","authors":"Thaylon Guedes, V. Silva, M. Mattoso, Marcos V. N. Bedo, Daniel de Oliveira","doi":"10.1109/WORKS.2018.00009","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00009","url":null,"abstract":"Whenever high-performance computing applications meet data-intensive scalable systems, an attractive approach is the use of Apache Spark for the management of scientific workflows. Spark provides several advantages such as being widely supported and granting efficient in-memory data management for large-scale applications. However, Spark still lacks support for data tracking and workflow provenance. Additionally, Spark's memory management requires accessing all data movements between the workflow activities. Therefore, the running of legacy programs on Spark is interpreted as a \"black-box\" activity, which prevents the capture and analysis of implicit data movements. Here, we present SAMbA, an Apache Spark extension for the gathering of prospective and retrospective provenance and domain data within distributed scientific workflows. Our approach relies on enveloping both RDD structure and data contents at runtime so that (i) RDD-enclosure consumed and produced data are captured and registered by SAMbA in a structured way, and (ii) provenance data can be queried during and after the execution of scientific workflows. By following the W3C PROV representation, we model the roles of RDD regarding prospective and retrospective provenance data. Our solution provides mechanisms for the capture and storage of provenance data without jeopardizing Spark's performance. The provenance retrieval capabilities of our proposal are evaluated in a practical case study, in which data analytics are provided by several SAMbA parameterizations.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"266 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116588160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qimin Zhang, Nathaniel Kremer-Herman, Benjamín Tovar, D. Thain
{"title":"Reduction of Workflow Resource Consumption Using a Density-based Clustering Model","authors":"Qimin Zhang, Nathaniel Kremer-Herman, Benjamín Tovar, D. Thain","doi":"10.1109/WORKS.2018.00006","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00006","url":null,"abstract":"Often times, a researcher running a scientific workflow will ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for solving the resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time saving, cores saving, memory saving, and disk saving are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores saving, memory saving and disk saving are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130242489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Simpkin, I. Taylor, Daniel Harborne, G. Bent, A. Preece, Ragu K. Ganti
{"title":"Dynamic Distributed Orchestration of Node-RED IoT Workflows Using a Vector Symbolic Architecture","authors":"Christopher Simpkin, I. Taylor, Daniel Harborne, G. Bent, A. Preece, Ragu K. Ganti","doi":"10.1109/WORKS.2018.00011","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00011","url":null,"abstract":"There are a large number of workflow systems designed to work in various scientific domains, including support for the Internet of Things (IoT). One such workflow system is Node-RED, which is designed to bring workflow-based programming to IoT. However, the majority of scientific workflow systems, and specifically systems like Node-RED, are designed to operate in a fixed networked environment, which rely on a central point of coordination in order to manage the workflow. The main focus of the work described in this paper is to investigate means whereby we can migrate Node-RED workflows into a decentralized execution environment, so that such workflows can run on Edge networks, where nodes are extremely transient in nature. In this work, we demonstrate the feasibility of such an approach by showing how we can migrate a Node-RED based traffic congestion detection workflow into a decentralized environment. The detection algorithm is implemented as a set of Web services within Node-RED and we have architected and implemented a system that proxies the centralized Node-RED services using cognitively-aware wrapper services, designed to operate in a decentralized environment. Our cognitive services use a Vector Symbolic Architecture to semantically represent service descriptions and workflows in a way that can be unraveled on the fly without any central point of control. The VSA-based system is capable of parsing Node-RED workflows and migrating them to a decentralized environment for execution; providing a way to use Node-RED as a front-end graphical composition tool for decentralized workflows.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130857619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LOS: Level Order Sampling for Task Graph Scheduling on Heterogeneous Resources","authors":"Carl Witt, Sam Wheating, U. Leser","doi":"10.1109/WORKS.2018.00008","DOIUrl":"https://doi.org/10.1109/WORKS.2018.00008","url":null,"abstract":"List scheduling is an approach to task graph scheduling that has been shown to work well for scheduling tasks with data dependencies on heterogeneous resources. Key to the performance of a list scheduling heuristic is its method to prioritize the tasks, and various ranking schemes have been proposed in the literature. We propose a method that combines multiple random rankings instead of a using a deterministic ranking scheme. We introduce L-Orders, which are a subset of all topological orders of a directed acyclic graph. L-Orders can be used to explore targeted regions of the space of all topological orders. Using the observation that the makespans in one such region are often approximately normal distributed, we estimate the expected time to solution improvement in certain regions of the search space. We combine targeted search and improvement time estimations into a time budgeted search algorithm that balances exploration and exploitation of the search space. In 40,500 experiments, our schedules are 5% shorter on average and up to 40% shorter in extreme cases than schedules produced by HEFT.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123539523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Title Page","authors":"","doi":"10.1109/works.2018.00001","DOIUrl":"https://doi.org/10.1109/works.2018.00001","url":null,"abstract":"","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130556869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}