Reduction of Workflow Resource Consumption Using a Density-based Clustering Model

Qimin Zhang, Nathaniel Kremer-Herman, Benjamín Tovar, D. Thain
{"title":"Reduction of Workflow Resource Consumption Using a Density-based Clustering Model","authors":"Qimin Zhang, Nathaniel Kremer-Herman, Benjamín Tovar, D. Thain","doi":"10.1109/WORKS.2018.00006","DOIUrl":null,"url":null,"abstract":"Often times, a researcher running a scientific workflow will ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for solving the resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time saving, cores saving, memory saving, and disk saving are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores saving, memory saving and disk saving are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.","PeriodicalId":154317,"journal":{"name":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/ACM Workflows in Support of Large-Scale Science (WORKS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WORKS.2018.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Often times, a researcher running a scientific workflow will ask for orders of magnitude too few or too many resources to run their workflow. If the resource requisition is too small, the job may fail due to resource exhaustion; if it is too large, resources will be wasted though job may succeed. It would be ideal to achieve a near-optimal number of resources the workflow runs to ensure all jobs succeed and minimize resource waste. We present a strategy for solving the resource allocation problem: (1) resources consumed by each job are recorded by a resource monitor tool; (2) a density-based clustering model is proposed for discovering clusters in all jobs; (3) a maximal resource requisition is calculated as the ideal number of each cluster. We ran experiments with a synthetic workflow of homogeneous tasks as well as the bioinformatics tools Lifemapper, SHRIMP, BWA and BWA-GATK to capture the inherent nature of resource consumption of a workflow, the clustering allowed by the model, and its usefulness in real workflows. In Lifemapper, the least time saving, cores saving, memory saving, and disk saving are 13.82%, 16.62%, 49.15%, and 93.89%, respectively. In SHRIMP, BWA, and BWA-GATK, the least cores saving, memory saving and disk saving are 50%, 90.14%, and 51.82%, respectively. Compared with fixed resource allocation strategy, our approach provide a noticeable reduction of workflow resource consumption.
基于密度的聚类模型降低工作流资源消耗
通常情况下,一个研究人员运行一个科学的工作流程会要求数量级太少或太多的资源来运行他们的工作流程。如果请求的资源太少,作业可能会因为资源耗尽而失败;如果它太大,虽然作业可能成功,但会浪费资源。理想的做法是实现工作流运行的近乎最优的资源数量,以确保所有作业成功并最大限度地减少资源浪费。我们提出了一种解决资源分配问题的策略:(1)每个作业消耗的资源由资源监控工具记录;(2)提出了一种基于密度的聚类模型,用于发现所有作业中的聚类;(3)计算最大资源占用作为每个集群的理想数量。我们对同质任务的合成工作流以及生物信息学工具Lifemapper、SHRIMP、BWA和BWA- gatk进行了实验,以捕捉工作流资源消耗的内在本质、模型允许的聚类以及它在实际工作流中的实用性。在Lifemapper中,节省时间的最小值为13.82%,节省内核的最小值为16.62%,节省内存的最小值为49.15%,节省磁盘的最小值为93.89%。在SHRIMP、BWA和BWA- gatk中,内核节省率最低,内存节省率最低,磁盘节省率最低,分别为50%、90.14%和51.82%。与固定资源分配策略相比,我们的方法显著降低了工作流资源的消耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信