Amr Adel , Noor H.S. Alani , Tony Jan , Mukesh Prasad
{"title":"主要信息通信技术故障和恢复战略综述:加强数字复原力","authors":"Amr Adel , Noor H.S. Alani , Tony Jan , Mukesh Prasad","doi":"10.1016/j.cose.2025.104678","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents a comprehensive, cross-sector analysis of large-scale ICT failures to address the persistent gap in understanding how systemic digital breakdowns occur and propagate across platforms and industries. Through a comparative study of seven major global outages (2019–2024) — selected based on scale, technical transparency, and platform diversity — we identify recurring vulnerabilities in automation governance, configuration management, centralized infrastructure, and incident response. Using a custom analytical framework grounded in socio-technical and resilience engineering theory, the paper maps failure propagation patterns and derives a taxonomy of technical and organizational failure modes.</div><div>We empirically validate a suite of resilience strategies — including rollback automation, configuration-as-code, SOAR-enabled response orchestration, and chaos engineering — and demonstrate how they address failure propagation pathways observed in real-world incidents. A conceptual model for decentralized system upgrade planning is introduced, incorporating microservice segmentation, dependency mapping, and AI-assisted fault containment. The paper culminates in a forward-looking digital resilience roadmap that integrates predictive analytics, secure software supply chains, and adaptive human–machine collaboration. Core contributions include: (1) a cross-case classification of failure archetypes, (2) evidence-based design patterns for resilience, and (3) actionable frameworks for infrastructure operators and researchers working towards next-generation ICT robustness.</div></div>","PeriodicalId":51004,"journal":{"name":"Computers & Security","volume":"159 ","pages":"Article 104678"},"PeriodicalIF":5.4000,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A review of major ICT failures and recovery strategies: Strengthening digital resilience\",\"authors\":\"Amr Adel , Noor H.S. Alani , Tony Jan , Mukesh Prasad\",\"doi\":\"10.1016/j.cose.2025.104678\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents a comprehensive, cross-sector analysis of large-scale ICT failures to address the persistent gap in understanding how systemic digital breakdowns occur and propagate across platforms and industries. Through a comparative study of seven major global outages (2019–2024) — selected based on scale, technical transparency, and platform diversity — we identify recurring vulnerabilities in automation governance, configuration management, centralized infrastructure, and incident response. Using a custom analytical framework grounded in socio-technical and resilience engineering theory, the paper maps failure propagation patterns and derives a taxonomy of technical and organizational failure modes.</div><div>We empirically validate a suite of resilience strategies — including rollback automation, configuration-as-code, SOAR-enabled response orchestration, and chaos engineering — and demonstrate how they address failure propagation pathways observed in real-world incidents. A conceptual model for decentralized system upgrade planning is introduced, incorporating microservice segmentation, dependency mapping, and AI-assisted fault containment. The paper culminates in a forward-looking digital resilience roadmap that integrates predictive analytics, secure software supply chains, and adaptive human–machine collaboration. Core contributions include: (1) a cross-case classification of failure archetypes, (2) evidence-based design patterns for resilience, and (3) actionable frameworks for infrastructure operators and researchers working towards next-generation ICT robustness.</div></div>\",\"PeriodicalId\":51004,\"journal\":{\"name\":\"Computers & Security\",\"volume\":\"159 \",\"pages\":\"Article 104678\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-09-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Security\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167404825003670\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Security","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167404825003670","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
A review of major ICT failures and recovery strategies: Strengthening digital resilience
This paper presents a comprehensive, cross-sector analysis of large-scale ICT failures to address the persistent gap in understanding how systemic digital breakdowns occur and propagate across platforms and industries. Through a comparative study of seven major global outages (2019–2024) — selected based on scale, technical transparency, and platform diversity — we identify recurring vulnerabilities in automation governance, configuration management, centralized infrastructure, and incident response. Using a custom analytical framework grounded in socio-technical and resilience engineering theory, the paper maps failure propagation patterns and derives a taxonomy of technical and organizational failure modes.
We empirically validate a suite of resilience strategies — including rollback automation, configuration-as-code, SOAR-enabled response orchestration, and chaos engineering — and demonstrate how they address failure propagation pathways observed in real-world incidents. A conceptual model for decentralized system upgrade planning is introduced, incorporating microservice segmentation, dependency mapping, and AI-assisted fault containment. The paper culminates in a forward-looking digital resilience roadmap that integrates predictive analytics, secure software supply chains, and adaptive human–machine collaboration. Core contributions include: (1) a cross-case classification of failure archetypes, (2) evidence-based design patterns for resilience, and (3) actionable frameworks for infrastructure operators and researchers working towards next-generation ICT robustness.
期刊介绍:
Computers & Security is the most respected technical journal in the IT security field. With its high-profile editorial board and informative regular features and columns, the journal is essential reading for IT security professionals around the world.
Computers & Security provides you with a unique blend of leading edge research and sound practical management advice. It is aimed at the professional involved with computer security, audit, control and data integrity in all sectors - industry, commerce and academia. Recognized worldwide as THE primary source of reference for applied research and technical expertise it is your first step to fully secure systems.