Fault-Tolerant Scheduling for Scientific Workflows in Cloud Environments

K. Vinay, S.M. Dilip Kumar
{"title":"Fault-Tolerant Scheduling for Scientific Workflows in Cloud Environments","authors":"K. Vinay, S.M. Dilip Kumar","doi":"10.1109/IACC.2017.0043","DOIUrl":null,"url":null,"abstract":"Executing clustered tasks has proven to be an efficient method to improve the computation of Scientific Workflows (SWf) on clouds. However, clustered tasks has a higher probability of suffering from failures than a single task. Therefore, fault tolerance in cloud computing is extremely essential while running large-scale scientific applications. In this paper, a new heuristic called Cluster based Heterogeneous Earliest Finish Time (CHEFT) algorithm to enhance the scheduling and fault tolerance mechanism for SWf in highly distributed cloud environments is proposed. To mitigate the failure of clustered tasks, this algorithm uses idle-time of the provisioned resources to resubmit failed clustered tasks for successful execution of SWf. Experimental results show that the proposed algorithm have convincing impact on the SWf executions and also drastically reduce the resource waste compared to existing task replication techniques. A trace based simulation of five real SWf shows that this algorithm is able to sustain unexpected task failures with minimal cost and makespan.","PeriodicalId":248433,"journal":{"name":"2017 IEEE 7th International Advance Computing Conference (IACC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 7th International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IACC.2017.0043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Executing clustered tasks has proven to be an efficient method to improve the computation of Scientific Workflows (SWf) on clouds. However, clustered tasks has a higher probability of suffering from failures than a single task. Therefore, fault tolerance in cloud computing is extremely essential while running large-scale scientific applications. In this paper, a new heuristic called Cluster based Heterogeneous Earliest Finish Time (CHEFT) algorithm to enhance the scheduling and fault tolerance mechanism for SWf in highly distributed cloud environments is proposed. To mitigate the failure of clustered tasks, this algorithm uses idle-time of the provisioned resources to resubmit failed clustered tasks for successful execution of SWf. Experimental results show that the proposed algorithm have convincing impact on the SWf executions and also drastically reduce the resource waste compared to existing task replication techniques. A trace based simulation of five real SWf shows that this algorithm is able to sustain unexpected task failures with minimal cost and makespan.
云环境下科学工作流的容错调度
执行集群任务已被证明是改进云上科学工作流(SWf)计算的一种有效方法。但是,集群任务比单个任务出现故障的概率更高。因此,在运行大规模科学应用程序时,云计算中的容错是极其必要的。本文提出了一种新的启发式算法——基于集群的异构最早完成时间(CHEFT)算法,以增强SWf在高度分布式云环境下的调度和容错机制。为了减轻集群任务的失败,该算法使用所提供资源的空闲时间来重新提交失败的集群任务,以成功执行SWf。实验结果表明,与现有的任务复制技术相比,该算法对SWf的执行有令人信服的影响,并且大大减少了资源浪费。对五个真实SWf的跟踪仿真表明,该算法能够以最小的成本和最大完成时间维持意外的任务失败。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信