Shuli Wang, Kenli Li, Jing Mei, Kuan-Ching Li, Yan Wang
{"title":"A Task Scheduling Algorithm Based on Replication for Maximizing Reliability on Heterogeneous Computing Systems","authors":"Shuli Wang, Kenli Li, Jing Mei, Kuan-Ching Li, Yan Wang","doi":"10.1109/IPDPSW.2014.175","DOIUrl":null,"url":null,"abstract":"Over the past several years, a heterogeneous computing (HC) system has become more competative as a commercial computing platform than a homogeneous system. With the growing scale of HC systems, network failures become inevitable. To achieve high performance, communication reliability should be considered while designing reliability-aware task scheduling algorithms. In this paper, we propose a new algorithm called RMSR (Replication-based scheduling for Maximizing System Reliability), which incorporates task communication into system reliability. To maximize communication reliability, an improved algorithm which searches all optimal reliability communication paths for current tasks is proposed. During the task replication phase, the task reliability threshold is determined by users and each task has dynamic replicas. Our comparative studies based on randomly generated graphs show that our RMSR algorithm outperforms existing scheduling algorithms in terms of system reliability. Several factors affecting the performance are analyzed in the paper.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Over the past several years, a heterogeneous computing (HC) system has become more competative as a commercial computing platform than a homogeneous system. With the growing scale of HC systems, network failures become inevitable. To achieve high performance, communication reliability should be considered while designing reliability-aware task scheduling algorithms. In this paper, we propose a new algorithm called RMSR (Replication-based scheduling for Maximizing System Reliability), which incorporates task communication into system reliability. To maximize communication reliability, an improved algorithm which searches all optimal reliability communication paths for current tasks is proposed. During the task replication phase, the task reliability threshold is determined by users and each task has dynamic replicas. Our comparative studies based on randomly generated graphs show that our RMSR algorithm outperforms existing scheduling algorithms in terms of system reliability. Several factors affecting the performance are analyzed in the paper.