Improving Multisite Workflow Performance Using Model-Based Scheduling

2014 43rd International Conference on Parallel Processing Pub Date : 2014-10-18 DOI:10.1109/ICPP.2014.22

K. Maheshwari, Eun-Sung Jung, Jiayuan Meng, V. Vishwanath, R. Kettimuthu

{"title":"Improving Multisite Workflow Performance Using Model-Based Scheduling","authors":"K. Maheshwari, Eun-Sung Jung, Jiayuan Meng, V. Vishwanath, R. Kettimuthu","doi":"10.1109/ICPP.2014.22","DOIUrl":null,"url":null,"abstract":"Workflows play an important role in expressing and executing scientific applications. In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are geographically distributed. These computational sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use naive approaches to distribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity for a scientist. In this paper, we propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on different resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications in a distributed environment using the Swift distributed execution framework and show that our approach improves the execution time by up to 60% compared to the default schedule.","PeriodicalId":441115,"journal":{"name":"2014 43rd International Conference on Parallel Processing","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 43rd International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2014.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Workflows play an important role in expressing and executing scientific applications. In recent years, a variety of computational sites and resources have emerged, and users often have access to multiple resources that are geographically distributed. These computational sites are heterogeneous in nature and performance of different tasks in a workflow varies from one site to another. Additionally, users typically have a limited resource allocation at each site. In such cases, judicious scheduling strategy is required in order to map tasks in the workflow to resources so that the workload is balanced among sites and the overhead is minimized in data transfer. Most existing systems either run the entire workflow in a single site or use naive approaches to distribute the tasks across sites or leave it to the user to optimize the allocation of tasks to distributed resources. This results in a significant loss in productivity for a scientist. In this paper, we propose a multi-site workflow scheduling technique that uses performance models to predict the execution time on different resources and dynamic probes to identify the achievable network throughput between sites. We evaluate our approach using real world applications in a distributed environment using the Swift distributed execution framework and show that our approach improves the execution time by up to 60% compared to the default schedule.

查看原文本刊更多论文

使用基于模型的调度改进多站点工作流性能

工作流在表达和执行科学应用程序中起着重要的作用。近年来，出现了各种计算站点和资源，用户通常可以访问地理上分布的多个资源。这些计算站点本质上是异构的，工作流中不同任务的性能因站点而异。此外，用户在每个站点的资源分配通常是有限的。在这种情况下，需要明智的调度策略，以便将工作流中的任务映射到资源，以便在站点之间平衡工作负载，并最小化数据传输中的开销。大多数现有系统要么在单个站点中运行整个工作流，要么使用简单的方法跨站点分配任务，要么让用户优化任务分配到分布式资源。这对科学家的生产力造成了重大损失。在本文中，我们提出了一种多站点工作流调度技术，该技术使用性能模型来预测不同资源上的执行时间，并使用动态探针来确定站点之间可实现的网络吞吐量。我们使用Swift分布式执行框架在分布式环境中使用真实的应用程序来评估我们的方法，并表明与默认计划相比，我们的方法将执行时间提高了60%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 43rd International Conference on Parallel Processing

自引率

0.00%

发文量