Domenico Cotroneo, L. Simone, Pietro Liguori, R. Natella, N. Bidokhti
{"title":"增强云计算系统中的故障传播分析","authors":"Domenico Cotroneo, L. Simone, Pietro Liguori, R. Natella, N. Bidokhti","doi":"10.1109/ISSRE.2019.00023","DOIUrl":null,"url":null,"abstract":"In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost.","PeriodicalId":254749,"journal":{"name":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Enhancing Failure Propagation Analysis in Cloud Computing Systems\",\"authors\":\"Domenico Cotroneo, L. Simone, Pietro Liguori, R. Natella, N. Bidokhti\",\"doi\":\"10.1109/ISSRE.2019.00023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost.\",\"PeriodicalId\":254749,\"journal\":{\"name\":\"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)\",\"volume\":\"100 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSRE.2019.00023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSRE.2019.00023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Enhancing Failure Propagation Analysis in Cloud Computing Systems
In order to plan for failure recovery, the designers of cloud systems need to understand how their system can potentially fail. Unfortunately, analyzing the failure behavior of such systems can be very difficult and time-consuming, due to the large volume of events, non-determinism, and reuse of third-party components. To address these issues, we propose a novel approach that joins fault injection with anomaly detection to identify the symptoms of failures. We evaluated the proposed approach in the context of the OpenStack cloud computing platform. We show that our model can significantly improve the accuracy of failure analysis in terms of false positives and negatives, with a low computational cost.