Ferran Badosa, Antonio Espinosa, Gonzalo Vera, A. Ripoll
{"title":"生物回填:一种提高共享集群生物信息学工作流性能的调度策略","authors":"Ferran Badosa, Antonio Espinosa, Gonzalo Vera, A. Ripoll","doi":"10.5220/0006812901480156","DOIUrl":null,"url":null,"abstract":"In this work we present the bio-backfill scheduler, a backfill scheduler for bioinformatics workflows applications running on shared, heterogeneous clusters. Backfill techniques advance low-priority jobs in cluster queues, if doing so doesn’t delay higher-priority jobs. They improve the resource utilization and turnaround achieved with classical policies such as First Come First Served, Longest Job First.. When attempting to implement backfill techniques such as Firstfit or Bestfit on bioinformatics workflows, we have found several issues. Backfill requires runtime predictions, which is particularly difficult for bioinformatics applications. Their performance varies substantially depending on input datasets and the values of its many configuration parameters. Furthermore, backfill approaches are mainly intended to schedule independent, rather than dependent tasks as those forming workflows. Backfilled jobs are chosen upon its number of processors and length runtime, but not by considering the amount of slowdown when the Degree of Multiprogramming of the nodes is greater than 1. To tackle these issues, we developed the bio-backfill scheduler. Based on a predictor generating performance predictions of each job with multiple resources, and a resource-sharing model that minimizes slowdown, we designed a scheduling algorithm capable of backfilling bioinformatics workflows applications. Our experiments show that our proposal can improve average workflow turnaround by roughly 9% by and resource utilization by almost 4%, compared to popular backfill strategies such as Firstfit or BestFit.","PeriodicalId":414016,"journal":{"name":"International Conference on Complex Information Systems","volume":"337 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters\",\"authors\":\"Ferran Badosa, Antonio Espinosa, Gonzalo Vera, A. Ripoll\",\"doi\":\"10.5220/0006812901480156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we present the bio-backfill scheduler, a backfill scheduler for bioinformatics workflows applications running on shared, heterogeneous clusters. Backfill techniques advance low-priority jobs in cluster queues, if doing so doesn’t delay higher-priority jobs. They improve the resource utilization and turnaround achieved with classical policies such as First Come First Served, Longest Job First.. When attempting to implement backfill techniques such as Firstfit or Bestfit on bioinformatics workflows, we have found several issues. Backfill requires runtime predictions, which is particularly difficult for bioinformatics applications. Their performance varies substantially depending on input datasets and the values of its many configuration parameters. Furthermore, backfill approaches are mainly intended to schedule independent, rather than dependent tasks as those forming workflows. Backfilled jobs are chosen upon its number of processors and length runtime, but not by considering the amount of slowdown when the Degree of Multiprogramming of the nodes is greater than 1. To tackle these issues, we developed the bio-backfill scheduler. Based on a predictor generating performance predictions of each job with multiple resources, and a resource-sharing model that minimizes slowdown, we designed a scheduling algorithm capable of backfilling bioinformatics workflows applications. Our experiments show that our proposal can improve average workflow turnaround by roughly 9% by and resource utilization by almost 4%, compared to popular backfill strategies such as Firstfit or BestFit.\",\"PeriodicalId\":414016,\"journal\":{\"name\":\"International Conference on Complex Information Systems\",\"volume\":\"337 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Complex Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0006812901480156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Complex Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0006812901480156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters
In this work we present the bio-backfill scheduler, a backfill scheduler for bioinformatics workflows applications running on shared, heterogeneous clusters. Backfill techniques advance low-priority jobs in cluster queues, if doing so doesn’t delay higher-priority jobs. They improve the resource utilization and turnaround achieved with classical policies such as First Come First Served, Longest Job First.. When attempting to implement backfill techniques such as Firstfit or Bestfit on bioinformatics workflows, we have found several issues. Backfill requires runtime predictions, which is particularly difficult for bioinformatics applications. Their performance varies substantially depending on input datasets and the values of its many configuration parameters. Furthermore, backfill approaches are mainly intended to schedule independent, rather than dependent tasks as those forming workflows. Backfilled jobs are chosen upon its number of processors and length runtime, but not by considering the amount of slowdown when the Degree of Multiprogramming of the nodes is greater than 1. To tackle these issues, we developed the bio-backfill scheduler. Based on a predictor generating performance predictions of each job with multiple resources, and a resource-sharing model that minimizes slowdown, we designed a scheduling algorithm capable of backfilling bioinformatics workflows applications. Our experiments show that our proposal can improve average workflow turnaround by roughly 9% by and resource utilization by almost 4%, compared to popular backfill strategies such as Firstfit or BestFit.