在线容错服务器整合算法

IF 7.5 2区计算机科学 Q1 TELECOMMUNICATIONS

Digital Communications and Networks Pub Date : 2025-04-01 DOI:10.1016/j.dcan.2024.06.007

Boyu Li , Bin Wu , Meng Shen , Hao Peng , Weisheng Li , Hong Zhang , Jie Gan , Zhihong Tian , Guangquan Xu

{"title":"在线容错服务器整合算法","authors":"Boyu Li , Bin Wu , Meng Shen , Hao Peng , Weisheng Li , Hong Zhang , Jie Gan , Zhihong Tian , Guangquan Xu","doi":"10.1016/j.dcan.2024.06.007","DOIUrl":null,"url":null,"abstract":"<div><div>We study a novel replication mechanism to ensure service continuity against multiple simultaneous server failures. In this mechanism, each item represents a computing task and is replicated into <span><math><mi>ξ</mi><mo>+</mo><mn>1</mn></math></span> servers for some integer <span><math><mi>ξ</mi><mo>≥</mo><mn>1</mn></math></span>, with workloads specified by the amount of required resources. If one or more servers fail, the affected workloads can be redirected to other servers that host replicas associated with the same item, such that the service is not interrupted by the failure of up to <em>ξ</em> servers. This requires that any feasible assignment algorithm must reserve some capacity in each server to accommodate the workload redirected from potential failed servers without overloading, and determining the optimal method for reserving capacity becomes a key issue. Unlike existing algorithms that assume that no two servers share replicas of more than one item, we first formulate capacity reservation for a general arbitrary scenario. Due to the combinatorial nature of this problem, finding the optimal solution is difficult. To this end, we propose a Generalized and Simple Calculating Reserved Capacity (GSCRC) algorithm, with a time complexity only related to the number of items packed in the server. In conjunction with GSCRC, we propose a robust replica packing algorithm with capacity optimization (RobustPack), which aims to minimize the number of servers hosting replicas and tolerate multiple server failures. Through theoretical analysis and experimental evaluations, we show that the RobustPack algorithm can achieve better performance.</div></div>","PeriodicalId":48631,"journal":{"name":"Digital Communications and Networks","volume":"11 2","pages":"Pages 514-523"},"PeriodicalIF":7.5000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Algorithms for online fault tolerance server consolidation\",\"authors\":\"Boyu Li , Bin Wu , Meng Shen , Hao Peng , Weisheng Li , Hong Zhang , Jie Gan , Zhihong Tian , Guangquan Xu\",\"doi\":\"10.1016/j.dcan.2024.06.007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>We study a novel replication mechanism to ensure service continuity against multiple simultaneous server failures. In this mechanism, each item represents a computing task and is replicated into <span><math><mi>ξ</mi><mo>+</mo><mn>1</mn></math></span> servers for some integer <span><math><mi>ξ</mi><mo>≥</mo><mn>1</mn></math></span>, with workloads specified by the amount of required resources. If one or more servers fail, the affected workloads can be redirected to other servers that host replicas associated with the same item, such that the service is not interrupted by the failure of up to <em>ξ</em> servers. This requires that any feasible assignment algorithm must reserve some capacity in each server to accommodate the workload redirected from potential failed servers without overloading, and determining the optimal method for reserving capacity becomes a key issue. Unlike existing algorithms that assume that no two servers share replicas of more than one item, we first formulate capacity reservation for a general arbitrary scenario. Due to the combinatorial nature of this problem, finding the optimal solution is difficult. To this end, we propose a Generalized and Simple Calculating Reserved Capacity (GSCRC) algorithm, with a time complexity only related to the number of items packed in the server. In conjunction with GSCRC, we propose a robust replica packing algorithm with capacity optimization (RobustPack), which aims to minimize the number of servers hosting replicas and tolerate multiple server failures. Through theoretical analysis and experimental evaluations, we show that the RobustPack algorithm can achieve better performance.</div></div>\",\"PeriodicalId\":48631,\"journal\":{\"name\":\"Digital Communications and Networks\",\"volume\":\"11 2\",\"pages\":\"Pages 514-523\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Communications and Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352864824000749\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Communications and Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352864824000749","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

我们研究了一种新的复制机制，以确保在多个服务器同时故障时服务的连续性。在这种机制中，每个项目代表一个计算任务，并被复制到ξ+1个服务器上，为某个整数ξ≥1，工作负载由所需资源的数量指定。如果一个或多个服务器出现故障，受影响的工作负载可以被重定向到其他服务器，这些服务器托管与同一项关联的副本，这样服务就不会因为多达ξ个服务器的故障而中断。这要求任何可行的分配算法都必须在每个服务器中保留一定的容量，以适应从潜在故障服务器重定向的工作负载而不会过载，并且确定保留容量的最佳方法成为关键问题。与现有算法假设没有两个服务器共享多个项目的副本不同，我们首先为一般任意场景制定容量预留。由于这个问题的组合性质，找到最优解是困难的。为此，我们提出了一种广义和简单计算保留容量（GSCRC）算法，其时间复杂度仅与服务器中打包的项目数量有关。结合GSCRC，我们提出了一种具有容量优化的鲁棒副本打包算法（RobustPack），旨在最大限度地减少托管副本的服务器数量并容忍多个服务器故障。通过理论分析和实验评估，我们证明了RobustPack算法可以获得更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Algorithms for online fault tolerance server consolidation

We study a novel replication mechanism to ensure service continuity against multiple simultaneous server failures. In this mechanism, each item represents a computing task and is replicated into

ξ + 1

servers for some integer

ξ \geq 1

, with workloads specified by the amount of required resources. If one or more servers fail, the affected workloads can be redirected to other servers that host replicas associated with the same item, such that the service is not interrupted by the failure of up to ξ servers. This requires that any feasible assignment algorithm must reserve some capacity in each server to accommodate the workload redirected from potential failed servers without overloading, and determining the optimal method for reserving capacity becomes a key issue. Unlike existing algorithms that assume that no two servers share replicas of more than one item, we first formulate capacity reservation for a general arbitrary scenario. Due to the combinatorial nature of this problem, finding the optimal solution is difficult. To this end, we propose a Generalized and Simple Calculating Reserved Capacity (GSCRC) algorithm, with a time complexity only related to the number of items packed in the server. In conjunction with GSCRC, we propose a robust replica packing algorithm with capacity optimization (RobustPack), which aims to minimize the number of servers hosting replicas and tolerate multiple server failures. Through theoretical analysis and experimental evaluations, we show that the RobustPack algorithm can achieve better performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Communications and Networks Computer Science-Hardware and Architecture

CiteScore

12.80

自引率

5.10%

发文量

915

审稿时长

30 weeks

期刊介绍： Digital Communications and Networks is a prestigious journal that emphasizes on communication systems and networks. We publish only top-notch original articles and authoritative reviews, which undergo rigorous peer-review. We are proud to announce that all our articles are fully Open Access and can be accessed on ScienceDirect. Our journal is recognized and indexed by eminent databases such as the Science Citation Index Expanded (SCIE) and Scopus. In addition to regular articles, we may also consider exceptional conference papers that have been significantly expanded. Furthermore, we periodically release special issues that focus on specific aspects of the field. In conclusion, Digital Communications and Networks is a leading journal that guarantees exceptional quality and accessibility for researchers and scholars in the field of communication systems and networks.