Amro Al-Said Ahmad, Lamis F. Al-Qora’n, Ahmad Zayed
{"title":"Exploring the impact of chaos engineering with various user loads on cloud native applications: an exploratory empirical study","authors":"Amro Al-Said Ahmad, Lamis F. Al-Qora’n, Ahmad Zayed","doi":"10.1007/s00607-024-01292-z","DOIUrl":null,"url":null,"abstract":"<p>One of the most popular models that provide computer resources today is cloud computing. Today’s dynamic and successful platforms are created to take advantage of various resources available from service providers. Ensuring the performance and availability of such resources and services is a crucial problem. Any software system may be subject to faults that might propagate to cause failures. Such faults with the potential of contributing to failures are critical because they impair performance and result in a delayed reaction, which is regarded as a dependability problem. To ensure that critical faults can be discovered as soon as possible, the impact of such faults on the system must be tested. The performance and dependability of cloud-native systems are examined in this empirical study using fault injection, one of the chaos engineering techniques. The study explores the impacts and results of injecting various delay times into two cloud-native applications with diverse user numbers. The performance of the applications with various numbers of users is measured in relation to these delays, which accordingly reflects measuring the dependability of those systems. Firstly, the systems’ architecture were identified, and serverless with two Lambda functions and containerised microservices applications were chosen, which depend on utilising and incorporating cloud-native services. Secondly, faults are injected in order to quantify performance attributes such as throughput and latency. The results of several controlled experiments carried out in real-world cloud environments provide exploratory empirical data, which promoted comparisons and statistical analysis that we utilised to identify the behaviour of the application while experiencing stress. Typical results from this investigation include an overall reduction in performance that is embodied in an increase in latency with injecting delays. However, a remarkable result is noticed at a particular delay in which defects and availability problems appear out of nowhere. These findings assist in highlighting the value of using chaos engineering in general and fault injection in particular to assess the dependability of cloud-native applications and to find unpredicted failures that could arise quickly from defects that aren’t supposed to spread and result in dependability issues.</p>","PeriodicalId":10718,"journal":{"name":"Computing","volume":"18 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00607-024-01292-z","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
One of the most popular models that provide computer resources today is cloud computing. Today’s dynamic and successful platforms are created to take advantage of various resources available from service providers. Ensuring the performance and availability of such resources and services is a crucial problem. Any software system may be subject to faults that might propagate to cause failures. Such faults with the potential of contributing to failures are critical because they impair performance and result in a delayed reaction, which is regarded as a dependability problem. To ensure that critical faults can be discovered as soon as possible, the impact of such faults on the system must be tested. The performance and dependability of cloud-native systems are examined in this empirical study using fault injection, one of the chaos engineering techniques. The study explores the impacts and results of injecting various delay times into two cloud-native applications with diverse user numbers. The performance of the applications with various numbers of users is measured in relation to these delays, which accordingly reflects measuring the dependability of those systems. Firstly, the systems’ architecture were identified, and serverless with two Lambda functions and containerised microservices applications were chosen, which depend on utilising and incorporating cloud-native services. Secondly, faults are injected in order to quantify performance attributes such as throughput and latency. The results of several controlled experiments carried out in real-world cloud environments provide exploratory empirical data, which promoted comparisons and statistical analysis that we utilised to identify the behaviour of the application while experiencing stress. Typical results from this investigation include an overall reduction in performance that is embodied in an increase in latency with injecting delays. However, a remarkable result is noticed at a particular delay in which defects and availability problems appear out of nowhere. These findings assist in highlighting the value of using chaos engineering in general and fault injection in particular to assess the dependability of cloud-native applications and to find unpredicted failures that could arise quickly from defects that aren’t supposed to spread and result in dependability issues.
期刊介绍:
Computing publishes original papers, short communications and surveys on all fields of computing. The contributions should be written in English and may be of theoretical or applied nature, the essential criteria are computational relevance and systematic foundation of results.