生物回填:一种提高共享集群生物信息学工作流性能的调度策略

Ferran Badosa, Antonio Espinosa, Gonzalo Vera, A. Ripoll
{"title":"生物回填:一种提高共享集群生物信息学工作流性能的调度策略","authors":"Ferran Badosa, Antonio Espinosa, Gonzalo Vera, A. Ripoll","doi":"10.5220/0006812901480156","DOIUrl":null,"url":null,"abstract":"In this work we present the bio-backfill scheduler, a backfill scheduler for bioinformatics workflows applications running on shared, heterogeneous clusters. Backfill techniques advance low-priority jobs in cluster queues, if doing so doesn’t delay higher-priority jobs. They improve the resource utilization and turnaround achieved with classical policies such as First Come First Served, Longest Job First.. When attempting to implement backfill techniques such as Firstfit or Bestfit on bioinformatics workflows, we have found several issues. Backfill requires runtime predictions, which is particularly difficult for bioinformatics applications. Their performance varies substantially depending on input datasets and the values of its many configuration parameters. Furthermore, backfill approaches are mainly intended to schedule independent, rather than dependent tasks as those forming workflows. Backfilled jobs are chosen upon its number of processors and length runtime, but not by considering the amount of slowdown when the Degree of Multiprogramming of the nodes is greater than 1. To tackle these issues, we developed the bio-backfill scheduler. Based on a predictor generating performance predictions of each job with multiple resources, and a resource-sharing model that minimizes slowdown, we designed a scheduling algorithm capable of backfilling bioinformatics workflows applications. Our experiments show that our proposal can improve average workflow turnaround by roughly 9% by and resource utilization by almost 4%, compared to popular backfill strategies such as Firstfit or BestFit.","PeriodicalId":414016,"journal":{"name":"International Conference on Complex Information Systems","volume":"337 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters\",\"authors\":\"Ferran Badosa, Antonio Espinosa, Gonzalo Vera, A. Ripoll\",\"doi\":\"10.5220/0006812901480156\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work we present the bio-backfill scheduler, a backfill scheduler for bioinformatics workflows applications running on shared, heterogeneous clusters. Backfill techniques advance low-priority jobs in cluster queues, if doing so doesn’t delay higher-priority jobs. They improve the resource utilization and turnaround achieved with classical policies such as First Come First Served, Longest Job First.. When attempting to implement backfill techniques such as Firstfit or Bestfit on bioinformatics workflows, we have found several issues. Backfill requires runtime predictions, which is particularly difficult for bioinformatics applications. Their performance varies substantially depending on input datasets and the values of its many configuration parameters. Furthermore, backfill approaches are mainly intended to schedule independent, rather than dependent tasks as those forming workflows. Backfilled jobs are chosen upon its number of processors and length runtime, but not by considering the amount of slowdown when the Degree of Multiprogramming of the nodes is greater than 1. To tackle these issues, we developed the bio-backfill scheduler. Based on a predictor generating performance predictions of each job with multiple resources, and a resource-sharing model that minimizes slowdown, we designed a scheduling algorithm capable of backfilling bioinformatics workflows applications. Our experiments show that our proposal can improve average workflow turnaround by roughly 9% by and resource utilization by almost 4%, compared to popular backfill strategies such as Firstfit or BestFit.\",\"PeriodicalId\":414016,\"journal\":{\"name\":\"International Conference on Complex Information Systems\",\"volume\":\"337 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Complex Information Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5220/0006812901480156\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Complex Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0006812901480156","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在这项工作中,我们提出了生物回填调度器,一个在共享异构集群上运行的生物信息学工作流应用程序的回填调度器。如果回填技术不会延迟高优先级的作业,则可以推进集群队列中的低优先级作业。它们提高了传统策略(如先到先得、最长作业优先)的资源利用率和周转时间。当试图在生物信息学工作流程中实现诸如Firstfit或Bestfit之类的回填技术时,我们发现了几个问题。回填需要运行时预测,这对于生物信息学应用来说尤其困难。它们的性能在很大程度上取决于输入数据集及其许多配置参数的值。此外,回填方法主要用于调度独立的任务,而不是作为形成工作流的依赖任务。回填作业是根据其处理器数量和运行时长度来选择的,而不是考虑节点的多编程程度大于1时的减速量。为了解决这些问题,我们开发了生物回填调度程序。基于预测器生成具有多个资源的每个作业的性能预测,以及最小化速度的资源共享模型,我们设计了一个能够回填生物信息学工作流应用的调度算法。我们的实验表明,与流行的回填策略(如Firstfit或BestFit)相比,我们的建议可以将平均工作流程周转时间提高大约9%,资源利用率提高近4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Bio-backfill: A Scheduling Policy Enhancing the Performance of Bioinformatics Workflows in Shared Clusters
In this work we present the bio-backfill scheduler, a backfill scheduler for bioinformatics workflows applications running on shared, heterogeneous clusters. Backfill techniques advance low-priority jobs in cluster queues, if doing so doesn’t delay higher-priority jobs. They improve the resource utilization and turnaround achieved with classical policies such as First Come First Served, Longest Job First.. When attempting to implement backfill techniques such as Firstfit or Bestfit on bioinformatics workflows, we have found several issues. Backfill requires runtime predictions, which is particularly difficult for bioinformatics applications. Their performance varies substantially depending on input datasets and the values of its many configuration parameters. Furthermore, backfill approaches are mainly intended to schedule independent, rather than dependent tasks as those forming workflows. Backfilled jobs are chosen upon its number of processors and length runtime, but not by considering the amount of slowdown when the Degree of Multiprogramming of the nodes is greater than 1. To tackle these issues, we developed the bio-backfill scheduler. Based on a predictor generating performance predictions of each job with multiple resources, and a resource-sharing model that minimizes slowdown, we designed a scheduling algorithm capable of backfilling bioinformatics workflows applications. Our experiments show that our proposal can improve average workflow turnaround by roughly 9% by and resource utilization by almost 4%, compared to popular backfill strategies such as Firstfit or BestFit.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信