{"title":"A Novel Scheduling Approach for Spark Workflow Tasks With Deadline and Uncertain Performance in Multi-Cloud Networks","authors":"Kamran Yaseen Rajput;Xiaoping Li;Jinquan Zhang;Abdullah Lakhan","doi":"10.1109/TCC.2024.3449771","DOIUrl":null,"url":null,"abstract":"These days, the usage of cloud computing services for different applications has been growing progressively. The applications, including business, commerce, healthcare, and others, require additional computation capabilities for their executions. To fulfil their expanding computational demands, cloud computing offers a pay-as-you-go billing model to run these applications cost-effectively. However, due to the complex requirements of these applications, more than one cloud system is required because single-cloud solutions are often limited by resource constraints, such as inadequate storage and computing power, as well as single-point failures that can compromise the integrity of the entire application. Consequently, multi-cloud strategies, which provide more scalable storage and computing resources, are becoming increasingly popular. However, the multi-cloud landscape consists of many cloud providers, and effectively managing workflow scheduling presents a significant hurdle in this dynamic environment. This paper focuses on scheduling Spark workflow tasks in multi-cloud networks. It addresses the challenges posed by different pricing models, dynamic resource provisioning, inter- and intra-transmission time, and the instability of resource performance. To solve these challenges, we propose a novel heuristic-based approach that considers different constraints such as VM instances heterogeneity, priority constraints, transmission times, and the impact of performance uncertainty. The goal is to schedule all tasks on virtual machines (VMs) with rental costs as low as possible while meeting workflow deadlines. The simulation results show that the proposed method effectively schedules Spark workflow tasks in multi-cloud networks, improving the scheduling performance by 50% compared to existing approaches.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"12 4","pages":"1145-1157"},"PeriodicalIF":5.3000,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10646491/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
These days, the usage of cloud computing services for different applications has been growing progressively. The applications, including business, commerce, healthcare, and others, require additional computation capabilities for their executions. To fulfil their expanding computational demands, cloud computing offers a pay-as-you-go billing model to run these applications cost-effectively. However, due to the complex requirements of these applications, more than one cloud system is required because single-cloud solutions are often limited by resource constraints, such as inadequate storage and computing power, as well as single-point failures that can compromise the integrity of the entire application. Consequently, multi-cloud strategies, which provide more scalable storage and computing resources, are becoming increasingly popular. However, the multi-cloud landscape consists of many cloud providers, and effectively managing workflow scheduling presents a significant hurdle in this dynamic environment. This paper focuses on scheduling Spark workflow tasks in multi-cloud networks. It addresses the challenges posed by different pricing models, dynamic resource provisioning, inter- and intra-transmission time, and the instability of resource performance. To solve these challenges, we propose a novel heuristic-based approach that considers different constraints such as VM instances heterogeneity, priority constraints, transmission times, and the impact of performance uncertainty. The goal is to schedule all tasks on virtual machines (VMs) with rental costs as low as possible while meeting workflow deadlines. The simulation results show that the proposed method effectively schedules Spark workflow tasks in multi-cloud networks, improving the scheduling performance by 50% compared to existing approaches.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.