基于云的并行工作流的强化学习调度策略

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2019-05-01 DOI:10.1109/IPDPSW.2019.00134

André Nascimento, Victor Olimpio, V. Silva, A. Paes, Daniel de Oliveira

{"title":"基于云的并行工作流的强化学习调度策略","authors":"André Nascimento, Victor Olimpio, V. Silva, A. Paes, Daniel de Oliveira","doi":"10.1109/IPDPSW.2019.00134","DOIUrl":null,"url":null,"abstract":"Scientific experiments can be modeled as Workflows. Such Workflows are usually computing-and data-intensive, demanding the use of High-Performance Computing environments such as clusters, grids, and clouds. This latter offers the advantage of elasticity, which allows for increasing and/or decreasing the number of Virtual Machines (VMs) on demand. Workflows are typically managed using Scientific Workflow Management Systems (SWfMS). Many existing SWfMSs offer support for cloud-based execution. Each SWfMS has its own scheduler that follows a well-defined cost function. However, such cost functions must consider the characteristics of a dynamic environment, such as live migrations and/or performance fluctuations, which are far from trivial to model. This paper proposes a novel scheduling strategy, named ReASSIgN, based on Reinforcement Learning (RL). By relying on an RL technique, one may assume that there is an optimal (or sub-optimal) solution for the scheduling problem, and aims at learning the best scheduling based on previous executions in the absence of a mathematical model of the environment. For this, an extension of a well-known workflow simulator WorkflowSim is proposed to implement an RL strategy for scheduling workflows. Once the scheduling plan is generated, the workflow is executed in the cloud using SciCumulus SWfMS. We conducted a thorough evaluation of the proposed scheduling strategy using a real astronomy workflow.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A Reinforcement Learning Scheduling Strategy for Parallel Cloud-Based Workflows\",\"authors\":\"André Nascimento, Victor Olimpio, V. Silva, A. Paes, Daniel de Oliveira\",\"doi\":\"10.1109/IPDPSW.2019.00134\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Scientific experiments can be modeled as Workflows. Such Workflows are usually computing-and data-intensive, demanding the use of High-Performance Computing environments such as clusters, grids, and clouds. This latter offers the advantage of elasticity, which allows for increasing and/or decreasing the number of Virtual Machines (VMs) on demand. Workflows are typically managed using Scientific Workflow Management Systems (SWfMS). Many existing SWfMSs offer support for cloud-based execution. Each SWfMS has its own scheduler that follows a well-defined cost function. However, such cost functions must consider the characteristics of a dynamic environment, such as live migrations and/or performance fluctuations, which are far from trivial to model. This paper proposes a novel scheduling strategy, named ReASSIgN, based on Reinforcement Learning (RL). By relying on an RL technique, one may assume that there is an optimal (or sub-optimal) solution for the scheduling problem, and aims at learning the best scheduling based on previous executions in the absence of a mathematical model of the environment. For this, an extension of a well-known workflow simulator WorkflowSim is proposed to implement an RL strategy for scheduling workflows. Once the scheduling plan is generated, the workflow is executed in the cloud using SciCumulus SWfMS. We conducted a thorough evaluation of the proposed scheduling strategy using a real astronomy workflow.\",\"PeriodicalId\":292054,\"journal\":{\"name\":\"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"2016 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2019.00134\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2019.00134","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

科学实验可以建模为工作流。这样的工作流通常是计算和数据密集型的，需要使用高性能计算环境，如集群、网格和云。后者提供了弹性的优势，它允许根据需要增加和/或减少虚拟机(vm)的数量。工作流通常使用科学工作流管理系统(SWfMS)进行管理。许多现有的swfms都支持基于云的执行。每个SWfMS都有自己的调度程序，该调度程序遵循定义良好的成本函数。但是，这种成本函数必须考虑动态环境的特征，例如实时迁移和/或性能波动，这些对于建模来说远非微不足道。本文提出了一种基于强化学习(RL)的调度策略ReASSIgN。通过依赖RL技术，人们可以假设调度问题存在最优(或次优)解决方案，并在缺乏环境数学模型的情况下，基于先前的执行学习最佳调度。为此，提出了一个著名的工作流模拟器WorkflowSim的扩展，以实现RL策略来调度工作流。调度计划生成后，使用SciCumulus SWfMS在云中执行工作流。我们使用真实的天文学工作流程对提出的调度策略进行了全面的评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Reinforcement Learning Scheduling Strategy for Parallel Cloud-Based Workflows

Scientific experiments can be modeled as Workflows. Such Workflows are usually computing-and data-intensive, demanding the use of High-Performance Computing environments such as clusters, grids, and clouds. This latter offers the advantage of elasticity, which allows for increasing and/or decreasing the number of Virtual Machines (VMs) on demand. Workflows are typically managed using Scientific Workflow Management Systems (SWfMS). Many existing SWfMSs offer support for cloud-based execution. Each SWfMS has its own scheduler that follows a well-defined cost function. However, such cost functions must consider the characteristics of a dynamic environment, such as live migrations and/or performance fluctuations, which are far from trivial to model. This paper proposes a novel scheduling strategy, named ReASSIgN, based on Reinforcement Learning (RL). By relying on an RL technique, one may assume that there is an optimal (or sub-optimal) solution for the scheduling problem, and aims at learning the best scheduling based on previous executions in the absence of a mathematical model of the environment. For this, an extension of a well-known workflow simulator WorkflowSim is proposed to implement an RL strategy for scheduling workflows. Once the scheduling plan is generated, the workflow is executed in the cloud using SciCumulus SWfMS. We conducted a thorough evaluation of the proposed scheduling strategy using a real astronomy workflow.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量