Leveraging Distributed Systems for Fault-Tolerant Cloud Computing: A Review of Strategies and Frameworks

Academic Journal of Nawroz University Pub Date : 2024-05-10 DOI:10.25007/ajnu.v13n2a2012

Saman M. Almufti, Subhi R. M. Zeebaree

{"title":"Leveraging Distributed Systems for Fault-Tolerant Cloud Computing: A Review of Strategies and Frameworks","authors":"Saman M. Almufti, Subhi R. M. Zeebaree","doi":"10.25007/ajnu.v13n2a2012","DOIUrl":null,"url":null,"abstract":"Ensuring system availability and reliability is crucial in the quickly developing field of cloud computing. The importance of fault tolerance in cloud infrastructure systems grows as organizations become more reliant on it to support their critical operations. The purpose of this article is to investigate the intricate realm of cloud computing and distributed systems. Specifically, the paper will investigate the numerous forms of cloud computing, fault tolerance methods, and frameworks that enable cloud services to be robust and durable. \nCloud computing has transformed the way in which organizations and individuals access and administer computing resources. The paper discusses several deployment options, including public, private, hybrid, and multi-cloud environments, which provide organizations with the advantages of flexibility, scalability, and cost-effectiveness. The inherent flexibility of cloud computing renders it well-suited for a diverse range of applications, spanning from the hosting of websites to the execution of intricate data analytics processes. \nGenerally, cloud computing encounters substantial obstacles, including the need of maintaining uninterrupted service in the face of hardware failures, network outages, or software errors, despite its tremendous benefits. The critical importance of fault tolerance in this particular situation cannot be overstated, as it plays a pivotal role in maintaining the dependability and availability of the system. \n \nThe primary objective of this study is to examine the utilization of distributed systems as a means to augment fault tolerance within the realm of cloud computing and distributed systems. Distributed systems offer an optimal approach for addressing difficulties related to fault tolerance, owing to its intrinsic capability to divide workloads and data over several nodes. This approach utilizes redundancy, replication, and the ability to recover seamlessly from disturbances, hence enhancing the resilience and resource efficiency of cloud services. This research reviews novel techniques and frameworks that utilize distributed systems to create fault-tolerant cloud computing architectures, emphasizing their substantial influence on the cloud computing domain. In conclusion, this research report includes a comparative analysis table that encompasses twenty preceding works.","PeriodicalId":303943,"journal":{"name":"Academic Journal of Nawroz University","volume":" 20","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Academic Journal of Nawroz University","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25007/ajnu.v13n2a2012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Ensuring system availability and reliability is crucial in the quickly developing field of cloud computing. The importance of fault tolerance in cloud infrastructure systems grows as organizations become more reliant on it to support their critical operations. The purpose of this article is to investigate the intricate realm of cloud computing and distributed systems. Specifically, the paper will investigate the numerous forms of cloud computing, fault tolerance methods, and frameworks that enable cloud services to be robust and durable. Cloud computing has transformed the way in which organizations and individuals access and administer computing resources. The paper discusses several deployment options, including public, private, hybrid, and multi-cloud environments, which provide organizations with the advantages of flexibility, scalability, and cost-effectiveness. The inherent flexibility of cloud computing renders it well-suited for a diverse range of applications, spanning from the hosting of websites to the execution of intricate data analytics processes. Generally, cloud computing encounters substantial obstacles, including the need of maintaining uninterrupted service in the face of hardware failures, network outages, or software errors, despite its tremendous benefits. The critical importance of fault tolerance in this particular situation cannot be overstated, as it plays a pivotal role in maintaining the dependability and availability of the system. The primary objective of this study is to examine the utilization of distributed systems as a means to augment fault tolerance within the realm of cloud computing and distributed systems. Distributed systems offer an optimal approach for addressing difficulties related to fault tolerance, owing to its intrinsic capability to divide workloads and data over several nodes. This approach utilizes redundancy, replication, and the ability to recover seamlessly from disturbances, hence enhancing the resilience and resource efficiency of cloud services. This research reviews novel techniques and frameworks that utilize distributed systems to create fault-tolerant cloud computing architectures, emphasizing their substantial influence on the cloud computing domain. In conclusion, this research report includes a comparative analysis table that encompasses twenty preceding works.

查看原文本刊更多论文

利用分布式系统实现容错云计算：战略和框架综述

在快速发展的云计算领域，确保系统的可用性和可靠性至关重要。随着企业越来越依赖云计算来支持其关键业务，云基础设施系统容错的重要性也与日俱增。本文旨在研究云计算和分布式系统的复杂领域。具体来说，本文将研究云计算的多种形式、容错方法以及使云服务稳健耐用的框架。云计算改变了组织和个人访问和管理计算资源的方式。本文讨论了几种部署方案，包括公共云、私有云、混合云和多云环境，它们为企业提供了灵活性、可扩展性和成本效益等优势。云计算固有的灵活性使其非常适合各种应用，从网站托管到执行复杂的数据分析流程。一般来说，尽管云计算具有巨大的优势，但它也会遇到巨大的障碍，包括面对硬件故障、网络中断或软件错误时需要保持不间断的服务。在这种特殊情况下，容错的重要性怎么强调都不为过，因为它在保持系统的可靠性和可用性方面发挥着关键作用。本研究的主要目的是在云计算和分布式系统领域内，研究如何利用分布式系统来增强容错能力。由于分布式系统具有将工作负载和数据划分到多个节点的内在能力，因此为解决与容错有关的困难提供了一种最佳方法。这种方法利用冗余、复制和从干扰中无缝恢复的能力，从而提高了云服务的弹性和资源效率。本研究回顾了利用分布式系统创建容错云计算架构的新型技术和框架，强调了它们对云计算领域的重大影响。最后，本研究报告还包括一份比较分析表，其中涵盖了 20 项前人的研究成果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Academic Journal of Nawroz University

自引率

0.00%

发文量