Arles Rodríguez , Ada Diaconescu , Johan Rodríguez , Jonatan Gómez
{"title":"Correlating node centrality metrics with node resilience in self-healing systems with limited neighbourhood information","authors":"Arles Rodríguez , Ada Diaconescu , Johan Rodríguez , Jonatan Gómez","doi":"10.1016/j.future.2024.107553","DOIUrl":null,"url":null,"abstract":"<div><div>Resilient systems must self-heal their components and connections to maintain their topology and function when failures occur. This ability becomes essential to many networked and distributed systems, e.g., virtualisation platforms, cloud services, microservice architectures and decentralised algorithms. This paper builds upon a self-healing approach where failed nodes are recreated and reconnected automatically based on topology information, which is maintained within each node’s neighbourhood. The paper proposes two novel contributions. First, it offers a generic method for establishing the minimum size of a network neighbourhood to be known by each node in order to recover the system’s component interconnection topology under a certain probability of node failure. This improves the previous proposal by reducing resource consumption, as only local information is communication and stored. Second, it adopts analysis techniques from complex networks theory to correlate a node’s recovery probability with its closeness centrality within the self-healing system. This allows strengthening a system’s resilience by analysing its topological characteristics and rewiring weakly-connected nodes. These contributions are supported by extensive simulation experiments on different systems with various topological characteristics. Obtained results confirm that nodes which propagate their topology information to more neighbours are more likely to be recovered; while requiring more resources. The proposed contributions can help practitioners to: identify the most fragile nodes in their distributed systems; consider corrective measures by increasing each node’s connectivity; and, establish a suitable compromise between system resilience and costs.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X2400517X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Resilient systems must self-heal their components and connections to maintain their topology and function when failures occur. This ability becomes essential to many networked and distributed systems, e.g., virtualisation platforms, cloud services, microservice architectures and decentralised algorithms. This paper builds upon a self-healing approach where failed nodes are recreated and reconnected automatically based on topology information, which is maintained within each node’s neighbourhood. The paper proposes two novel contributions. First, it offers a generic method for establishing the minimum size of a network neighbourhood to be known by each node in order to recover the system’s component interconnection topology under a certain probability of node failure. This improves the previous proposal by reducing resource consumption, as only local information is communication and stored. Second, it adopts analysis techniques from complex networks theory to correlate a node’s recovery probability with its closeness centrality within the self-healing system. This allows strengthening a system’s resilience by analysing its topological characteristics and rewiring weakly-connected nodes. These contributions are supported by extensive simulation experiments on different systems with various topological characteristics. Obtained results confirm that nodes which propagate their topology information to more neighbours are more likely to be recovered; while requiring more resources. The proposed contributions can help practitioners to: identify the most fragile nodes in their distributed systems; consider corrective measures by increasing each node’s connectivity; and, establish a suitable compromise between system resilience and costs.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.