{"title":"云基础设施的可靠性压力测试","authors":"Lena Feinbube, Lukas Pirl, Peter Tröger, A. Polze","doi":"10.1109/PDCAT.2017.00078","DOIUrl":null,"url":null,"abstract":"Modern distributed systems have reached a level of complexity where software bugs and hardware failures are no longer exceptional, but a permanent operational threat. This holds especially for cloud infrastructures, which need to deliver resources to their customers under well-defined service-level agreements. Dependability need to be assessed carefully. This article presents a structured approach for dependability stress testing in a cloud infrastructure. We automatically determine and inject the maximum amount of simultaneous non-fatal errors in different variations. This puts the existing resiliency mechanisms under heavy load, so that they are tested for their effectiveness in corner cases. The starting point is a failure space dependability model of the system. It includes the notion of fault tolerance dependencies, which encode fault-triggering relations between different software layers. From the model, our deterministic algorithm automatically derives fault injection campaigns that maximize dependability stress. The article demonstrates the feasibility of the approach with an assessment of a fault tolerant OpenStack cloud infrastructure deployment.","PeriodicalId":119197,"journal":{"name":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Dependability Stress Testing of Cloud Infrastructures\",\"authors\":\"Lena Feinbube, Lukas Pirl, Peter Tröger, A. Polze\",\"doi\":\"10.1109/PDCAT.2017.00078\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern distributed systems have reached a level of complexity where software bugs and hardware failures are no longer exceptional, but a permanent operational threat. This holds especially for cloud infrastructures, which need to deliver resources to their customers under well-defined service-level agreements. Dependability need to be assessed carefully. This article presents a structured approach for dependability stress testing in a cloud infrastructure. We automatically determine and inject the maximum amount of simultaneous non-fatal errors in different variations. This puts the existing resiliency mechanisms under heavy load, so that they are tested for their effectiveness in corner cases. The starting point is a failure space dependability model of the system. It includes the notion of fault tolerance dependencies, which encode fault-triggering relations between different software layers. From the model, our deterministic algorithm automatically derives fault injection campaigns that maximize dependability stress. The article demonstrates the feasibility of the approach with an assessment of a fault tolerant OpenStack cloud infrastructure deployment.\",\"PeriodicalId\":119197,\"journal\":{\"name\":\"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT.2017.00078\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT.2017.00078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Dependability Stress Testing of Cloud Infrastructures
Modern distributed systems have reached a level of complexity where software bugs and hardware failures are no longer exceptional, but a permanent operational threat. This holds especially for cloud infrastructures, which need to deliver resources to their customers under well-defined service-level agreements. Dependability need to be assessed carefully. This article presents a structured approach for dependability stress testing in a cloud infrastructure. We automatically determine and inject the maximum amount of simultaneous non-fatal errors in different variations. This puts the existing resiliency mechanisms under heavy load, so that they are tested for their effectiveness in corner cases. The starting point is a failure space dependability model of the system. It includes the notion of fault tolerance dependencies, which encode fault-triggering relations between different software layers. From the model, our deterministic algorithm automatically derives fault injection campaigns that maximize dependability stress. The article demonstrates the feasibility of the approach with an assessment of a fault tolerant OpenStack cloud infrastructure deployment.