Amr Adel , Noor H.S. Alani , Tony Jan , Mukesh Prasad
{"title":"A review of major ICT failures and recovery strategies: Strengthening digital resilience","authors":"Amr Adel , Noor H.S. Alani , Tony Jan , Mukesh Prasad","doi":"10.1016/j.cose.2025.104678","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a comprehensive, cross-sector analysis of large-scale ICT failures to address the persistent gap in understanding how systemic digital breakdowns occur and propagate across platforms and industries. Through a comparative study of seven major global outages (2019–2024) — selected based on scale, technical transparency, and platform diversity — we identify recurring vulnerabilities in automation governance, configuration management, centralized infrastructure, and incident response. Using a custom analytical framework grounded in socio-technical and resilience engineering theory, the paper maps failure propagation patterns and derives a taxonomy of technical and organizational failure modes.</div><div>We empirically validate a suite of resilience strategies — including rollback automation, configuration-as-code, SOAR-enabled response orchestration, and chaos engineering — and demonstrate how they address failure propagation pathways observed in real-world incidents. A conceptual model for decentralized system upgrade planning is introduced, incorporating microservice segmentation, dependency mapping, and AI-assisted fault containment. The paper culminates in a forward-looking digital resilience roadmap that integrates predictive analytics, secure software supply chains, and adaptive human–machine collaboration. Core contributions include: (1) a cross-case classification of failure archetypes, (2) evidence-based design patterns for resilience, and (3) actionable frameworks for infrastructure operators and researchers working towards next-generation ICT robustness.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"159 ","pages":"Article 104678"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003670","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper presents a comprehensive, cross-sector analysis of large-scale ICT failures to address the persistent gap in understanding how systemic digital breakdowns occur and propagate across platforms and industries. Through a comparative study of seven major global outages (2019–2024) — selected based on scale, technical transparency, and platform diversity — we identify recurring vulnerabilities in automation governance, configuration management, centralized infrastructure, and incident response. Using a custom analytical framework grounded in socio-technical and resilience engineering theory, the paper maps failure propagation patterns and derives a taxonomy of technical and organizational failure modes.
We empirically validate a suite of resilience strategies — including rollback automation, configuration-as-code, SOAR-enabled response orchestration, and chaos engineering — and demonstrate how they address failure propagation pathways observed in real-world incidents. A conceptual model for decentralized system upgrade planning is introduced, incorporating microservice segmentation, dependency mapping, and AI-assisted fault containment. The paper culminates in a forward-looking digital resilience roadmap that integrates predictive analytics, secure software supply chains, and adaptive human–machine collaboration. Core contributions include: (1) a cross-case classification of failure archetypes, (2) evidence-based design patterns for resilience, and (3) actionable frameworks for infrastructure operators and researchers working towards next-generation ICT robustness.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.