{"title":"Recovering Critical Service After Large-Scale Failures With Bayesian Network Tomography","authors":"Viviana Arrigoni;Matteo Prata;Novella Bartolini","doi":"10.1109/TNET.2024.3454478","DOIUrl":null,"url":null,"abstract":"Massive failures in communication networks result from natural disasters, heavy blackouts, and military and cyber attacks. After these events, an adequate network recovery plan is key to ensuring emergency-critical service restoration and preventing intolerable downtime and performance degradation. We tackle the problem of minimizing the time and number of interventions to sufficiently restore the communication network to support emergency services after large-scale failures. We propose Proton (Progressive RecOvery and Tomography-based mONitoring), an efficient algorithm for progressive recovery of emergency services. Unlike previous work, assuming centralized routing and complete network observability, Proton addresses the more realistic scenario in which the network relies on the existing routing protocols, and knowledge of the network state is partial and uncertain. Proton relies on Network Tomography for monitoring and acquiring information about the state of nodes and links. Simulation results on real topologies show that our algorithm outperforms previous solutions in terms of cumulative routed flow, repair costs and recovery time in static and dynamic failure scenarios.","PeriodicalId":13443,"journal":{"name":"IEEE/ACM Transactions on Networking","volume":"32 6","pages":"5216-5231"},"PeriodicalIF":3.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10679612","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10679612/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Massive failures in communication networks result from natural disasters, heavy blackouts, and military and cyber attacks. After these events, an adequate network recovery plan is key to ensuring emergency-critical service restoration and preventing intolerable downtime and performance degradation. We tackle the problem of minimizing the time and number of interventions to sufficiently restore the communication network to support emergency services after large-scale failures. We propose Proton (Progressive RecOvery and Tomography-based mONitoring), an efficient algorithm for progressive recovery of emergency services. Unlike previous work, assuming centralized routing and complete network observability, Proton addresses the more realistic scenario in which the network relies on the existing routing protocols, and knowledge of the network state is partial and uncertain. Proton relies on Network Tomography for monitoring and acquiring information about the state of nodes and links. Simulation results on real topologies show that our algorithm outperforms previous solutions in terms of cumulative routed flow, repair costs and recovery time in static and dynamic failure scenarios.
期刊介绍:
The IEEE/ACM Transactions on Networking’s high-level objective is to publish high-quality, original research results derived from theoretical or experimental exploration of the area of communication/computer networking, covering all sorts of information transport networks over all sorts of physical layer technologies, both wireline (all kinds of guided media: e.g., copper, optical) and wireless (e.g., radio-frequency, acoustic (e.g., underwater), infra-red), or hybrids of these. The journal welcomes applied contributions reporting on novel experiences and experiments with actual systems.