Progressive datacenter recovery over optical core networks after a large-scale disaster

Sifat Ferdousi, F. Dikbiyik, M. Tornatore, B. Mukherjee
{"title":"Progressive datacenter recovery over optical core networks after a large-scale disaster","authors":"Sifat Ferdousi, F. Dikbiyik, M. Tornatore, B. Mukherjee","doi":"10.1109/DRCN.2016.7470834","DOIUrl":null,"url":null,"abstract":"Today's cloud system are composed of geographically distributed datacenter interconnected by high-speed optical networks. Disaster failures can severely affect both the communication network as well as datacenters infrastructure and prevent users from accessing cloud services. After large-scale disasters, recovery efforts on both network and datacenters may take days, and, in some cases, weeks or months. Traditionally, the repair of the communication network has been treated as a separate problem from the repair of datacenters. While past research has mostly focused on network recovery, how to efficiently recover a cloud system jointly considering the limited computing and networking resources has been an important and open research problem. In this work, we investigate the problem of progressive datacenter recovery after a large-scale disaster failure, given that a network-recovery plan is made. An efficient recovery plan is explored to determine which datacenters should be recovered at each recovery stage to maximize cumulative content reachability from any source considering limited available network resources. We devise an Integer Linear Program (ILP) formulation to model the associated optimization problem. Our numerical examples using the ILP show that an efficient progressive datacenter-recovery plan can significantly help to increase reachability of contents during the network recovery phase. We succeeded in increasing the number of important contents in the early stages of recovery compared to a random-recovery strategy with a slight increase in resource consumption.","PeriodicalId":137650,"journal":{"name":"2016 12th International Conference on the Design of Reliable Communication Networks (DRCN)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on the Design of Reliable Communication Networks (DRCN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DRCN.2016.7470834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Today's cloud system are composed of geographically distributed datacenter interconnected by high-speed optical networks. Disaster failures can severely affect both the communication network as well as datacenters infrastructure and prevent users from accessing cloud services. After large-scale disasters, recovery efforts on both network and datacenters may take days, and, in some cases, weeks or months. Traditionally, the repair of the communication network has been treated as a separate problem from the repair of datacenters. While past research has mostly focused on network recovery, how to efficiently recover a cloud system jointly considering the limited computing and networking resources has been an important and open research problem. In this work, we investigate the problem of progressive datacenter recovery after a large-scale disaster failure, given that a network-recovery plan is made. An efficient recovery plan is explored to determine which datacenters should be recovered at each recovery stage to maximize cumulative content reachability from any source considering limited available network resources. We devise an Integer Linear Program (ILP) formulation to model the associated optimization problem. Our numerical examples using the ILP show that an efficient progressive datacenter-recovery plan can significantly help to increase reachability of contents during the network recovery phase. We succeeded in increasing the number of important contents in the early stages of recovery compared to a random-recovery strategy with a slight increase in resource consumption.
大规模灾难后通过光核心网的渐进数据中心恢复
今天的云系统是由地理分布的数据中心组成的,这些数据中心通过高速光网络相互连接。灾难故障会严重影响通信网络和数据中心基础设施,并阻止用户访问云服务。大规模灾难发生后,网络和数据中心的恢复工作可能需要几天,在某些情况下可能需要几周或几个月。传统上,通信网络的修复一直被视为与数据中心的修复不同的问题。过去的研究主要集中在网络恢复上,如何在有限的计算资源和网络资源的情况下有效地恢复云系统一直是一个重要而开放的研究问题。在这项工作中,我们研究了大规模灾难故障后的渐进式数据中心恢复问题,假设制定了网络恢复计划。研究了一个有效的恢复计划,以确定在每个恢复阶段应该恢复哪些数据中心,以最大限度地提高来自任何来源的累积内容可达性,同时考虑到有限的可用网络资源。我们设计了一个整数线性规划(ILP)公式来模拟相关的优化问题。我们使用ILP的数值示例表明,有效的渐进式数据中心恢复计划可以显着帮助提高网络恢复阶段的内容可达性。与随机恢复策略相比,我们成功地在恢复的早期阶段增加了重要内容的数量,并且资源消耗略有增加。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信