{"title":"基于集成学习的云应用故障预测","authors":"Jomar Domingos","doi":"10.1109/ISSREW53611.2021.00095","DOIUrl":null,"url":null,"abstract":"Faults are an inherent threat to computers systems and software. Predicting system failures that may occur in the near future will allow preventive actions to avoid or considerably reduce failure impact. In this work, we aim to develop a new methodology to accomplish failure prediction in cloud applications through ensemble machine learning. Our failure prediction approach consists of identifying sequences of system state patterns that precede failures (i.e., symptom detection) using failures datasets (obtained using realistic failure injection) to train different models. These ensembles will be subsequently validated using fault injection. An aspect necessarily addressed in or research is the study of the timing properties of failures and its impact on the failure prediction task, since the feasibility of failure prediction is strictly coupled with the notion of lead time. Failure prediction is feasible if there is enough time to predict the failure and to run prevention measures. Although cloud computing presents characteristics that allow applications to be more dependable (with high availability and reliability through fault tolerance mechanisms), the ability to take countermeasures before failure occurrence will allow to extend cloud based solutions to critical application scenarios. Therefore, machine learning (i.e., ensemble) models to predict failures is a promising path to achieve this goal.","PeriodicalId":385392,"journal":{"name":"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Failure Prediction for Cloud Applications through Ensemble Learning\",\"authors\":\"Jomar Domingos\",\"doi\":\"10.1109/ISSREW53611.2021.00095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Faults are an inherent threat to computers systems and software. Predicting system failures that may occur in the near future will allow preventive actions to avoid or considerably reduce failure impact. In this work, we aim to develop a new methodology to accomplish failure prediction in cloud applications through ensemble machine learning. Our failure prediction approach consists of identifying sequences of system state patterns that precede failures (i.e., symptom detection) using failures datasets (obtained using realistic failure injection) to train different models. These ensembles will be subsequently validated using fault injection. An aspect necessarily addressed in or research is the study of the timing properties of failures and its impact on the failure prediction task, since the feasibility of failure prediction is strictly coupled with the notion of lead time. Failure prediction is feasible if there is enough time to predict the failure and to run prevention measures. Although cloud computing presents characteristics that allow applications to be more dependable (with high availability and reliability through fault tolerance mechanisms), the ability to take countermeasures before failure occurrence will allow to extend cloud based solutions to critical application scenarios. Therefore, machine learning (i.e., ensemble) models to predict failures is a promising path to achieve this goal.\",\"PeriodicalId\":385392,\"journal\":{\"name\":\"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW53611.2021.00095\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW53611.2021.00095","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Failure Prediction for Cloud Applications through Ensemble Learning
Faults are an inherent threat to computers systems and software. Predicting system failures that may occur in the near future will allow preventive actions to avoid or considerably reduce failure impact. In this work, we aim to develop a new methodology to accomplish failure prediction in cloud applications through ensemble machine learning. Our failure prediction approach consists of identifying sequences of system state patterns that precede failures (i.e., symptom detection) using failures datasets (obtained using realistic failure injection) to train different models. These ensembles will be subsequently validated using fault injection. An aspect necessarily addressed in or research is the study of the timing properties of failures and its impact on the failure prediction task, since the feasibility of failure prediction is strictly coupled with the notion of lead time. Failure prediction is feasible if there is enough time to predict the failure and to run prevention measures. Although cloud computing presents characteristics that allow applications to be more dependable (with high availability and reliability through fault tolerance mechanisms), the ability to take countermeasures before failure occurrence will allow to extend cloud based solutions to critical application scenarios. Therefore, machine learning (i.e., ensemble) models to predict failures is a promising path to achieve this goal.