Joseph S. Botros , Lamis F. Al-Qora'n , Amro Al-Said Ahmad
{"title":"Towards antifragility of cloud systems: An adaptive chaos driven framework","authors":"Joseph S. Botros , Lamis F. Al-Qora'n , Amro Al-Said Ahmad","doi":"10.1016/j.infsof.2024.107519","DOIUrl":null,"url":null,"abstract":"<div><h3>Context</h3><p>Unlike resilience, antifragility describes systems that get stronger rather than weaker under stress and chaos. Antifragile systems have the capacity to overcome stressors and come out stronger, whereas resilient systems are focused on their capacity to return to their previous state following a failure. As technology environments become increasingly complex, there is a great need for developing software systems that can benefit from failures while continuously improving. Most applications nowadays operate in cloud environments. Thus, with this increasing adoption of Cloud-Native Systems they require antifragility due to their distributed nature.</p></div><div><h3>Objective</h3><p>The paper proposes UNFRAGILE framework, which facilitates the transformation of existing systems into antifragile systems. The framework employs chaos engineering to introduce failures incrementally and assess the system's response under such perturbation and improves the quality of system response by removing fragilities and introducing adaptive fault tolerance strategies.</p></div><div><h3>Method</h3><p>The UNFRAGILE framework's feasibility has been validated by applying it to a cloud-native using a real-world architecture to enhance its antifragility towards long outbound service latencies. The empirical investigation of fragility is undertaken, and the results show how chaos affects application performance metrics and causes disturbances in them. To deal with chaotic network latency, an adaptation phase is put into effect.</p></div><div><h3>Results</h3><p>The findings indicate that the steady stage's behaviour is like the antifragile stage's behaviour. This suggests that the system could self-stabilise during the chaos without the need to define a static configuration after determining from the context of the environment that the dependent system was experiencing difficulties.</p></div><div><h3>Conclusion</h3><p>Overall, this paper contributes to ongoing efforts to develop antifragile software capable of adapting to the rapidly changing complex environment. Overall, the research provides an operational framework for engineering software systems that learn and improve through exposure to failures rather than just surviving them.</p></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"174 ","pages":"Article 107519"},"PeriodicalIF":3.8000,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0950584924001241/pdfft?md5=4a3b226c5cce52766d27e46c6f346db1&pid=1-s2.0-S0950584924001241-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584924001241","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Context
Unlike resilience, antifragility describes systems that get stronger rather than weaker under stress and chaos. Antifragile systems have the capacity to overcome stressors and come out stronger, whereas resilient systems are focused on their capacity to return to their previous state following a failure. As technology environments become increasingly complex, there is a great need for developing software systems that can benefit from failures while continuously improving. Most applications nowadays operate in cloud environments. Thus, with this increasing adoption of Cloud-Native Systems they require antifragility due to their distributed nature.
Objective
The paper proposes UNFRAGILE framework, which facilitates the transformation of existing systems into antifragile systems. The framework employs chaos engineering to introduce failures incrementally and assess the system's response under such perturbation and improves the quality of system response by removing fragilities and introducing adaptive fault tolerance strategies.
Method
The UNFRAGILE framework's feasibility has been validated by applying it to a cloud-native using a real-world architecture to enhance its antifragility towards long outbound service latencies. The empirical investigation of fragility is undertaken, and the results show how chaos affects application performance metrics and causes disturbances in them. To deal with chaotic network latency, an adaptation phase is put into effect.
Results
The findings indicate that the steady stage's behaviour is like the antifragile stage's behaviour. This suggests that the system could self-stabilise during the chaos without the need to define a static configuration after determining from the context of the environment that the dependent system was experiencing difficulties.
Conclusion
Overall, this paper contributes to ongoing efforts to develop antifragile software capable of adapting to the rapidly changing complex environment. Overall, the research provides an operational framework for engineering software systems that learn and improve through exposure to failures rather than just surviving them.
期刊介绍:
Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include:
• Software management, quality and metrics,
• Software processes,
• Software architecture, modelling, specification, design and programming
• Functional and non-functional software requirements
• Software testing and verification & validation
• Empirical studies of all aspects of engineering and managing software development
Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information.
The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.