César Acevedo, P. Hernández, Antonio Espinosa, Víctor Méndez
{"title":"A Data-Aware MultiWorkflow Cluster Scheduler","authors":"César Acevedo, P. Hernández, Antonio Espinosa, Víctor Méndez","doi":"10.5220/0005932000950102","DOIUrl":null,"url":null,"abstract":"Previous scheduling research work is based on the analysis of the computational time of application workflows. Current use of clusters deals with the execution of multiworkflows that may share applications and input files. In order to reduce the makespan of such multiworkflows adequate data allocation policies should be applied to reduce input data latency. We propose a scheduling strategy for multiworkflows that considers the data location of shared input files in different locations of the storage system of the cluster. For that, we first merge all workflows in a study and evaluate the global design pattern obtained. Then, we apply a classic list scheduling heuristic considering the location of the input files in the storage system to reduce the communication overhead of the applications. We have evaluated our proposal with an initial set of experimental environments showing promising results of up to 20% makespan improvement.","PeriodicalId":414016,"journal":{"name":"International Conference on Complex Information Systems","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Complex Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0005932000950102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Previous scheduling research work is based on the analysis of the computational time of application workflows. Current use of clusters deals with the execution of multiworkflows that may share applications and input files. In order to reduce the makespan of such multiworkflows adequate data allocation policies should be applied to reduce input data latency. We propose a scheduling strategy for multiworkflows that considers the data location of shared input files in different locations of the storage system of the cluster. For that, we first merge all workflows in a study and evaluate the global design pattern obtained. Then, we apply a classic list scheduling heuristic considering the location of the input files in the storage system to reduce the communication overhead of the applications. We have evaluated our proposal with an initial set of experimental environments showing promising results of up to 20% makespan improvement.