{"title":"基于深度学习的大规模云应用故障预测模型","authors":"Mohammad S. Jassas, Q. Mahmoud","doi":"10.1109/SysCon48628.2021.9447141","DOIUrl":null,"url":null,"abstract":"Many cloud service providers face significant challenges in preventing hardware and software failure from occurring. Due to the large scale and heterogeneous nature of cloud computing, cloud services continue to experience failures in their components. A significant proportion of previous studies have focused on the characterization of failed jobs and understanding their behavior, while a few studies have focused on failure prediction, with a focus on increasing the accuracy of failure prediction models. This paper presents the development and implementation of a failure prediction model using a deep learning approach. The proposed model can identify and detect failed tasks early on before they occur. The key feature of the failure prediction model is to improve the performance of cloud applications by reducing the number of failed jobs. In order to investigate the behavior of failure and apply the prediction of failure to the large-scale environment, we used three different traces, namely Google Cluster Trace, Mustang and Trinity. Moreover, we have evaluated the proposed model performance using different evaluation metrics to ensure that the proposed model provides the highest accuracy of predicted values. The proposed model is designed and implemented to achieve high accuracy for failure prediction, regardless of whether the model uses a large or small trace size. The evaluation results show that our proposed model achieved a high precision, recall and f1 score.","PeriodicalId":384949,"journal":{"name":"2021 IEEE International Systems Conference (SysCon)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Failure Prediction Model for Large Scale Cloud Applications using Deep Learning\",\"authors\":\"Mohammad S. Jassas, Q. Mahmoud\",\"doi\":\"10.1109/SysCon48628.2021.9447141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many cloud service providers face significant challenges in preventing hardware and software failure from occurring. Due to the large scale and heterogeneous nature of cloud computing, cloud services continue to experience failures in their components. A significant proportion of previous studies have focused on the characterization of failed jobs and understanding their behavior, while a few studies have focused on failure prediction, with a focus on increasing the accuracy of failure prediction models. This paper presents the development and implementation of a failure prediction model using a deep learning approach. The proposed model can identify and detect failed tasks early on before they occur. The key feature of the failure prediction model is to improve the performance of cloud applications by reducing the number of failed jobs. In order to investigate the behavior of failure and apply the prediction of failure to the large-scale environment, we used three different traces, namely Google Cluster Trace, Mustang and Trinity. Moreover, we have evaluated the proposed model performance using different evaluation metrics to ensure that the proposed model provides the highest accuracy of predicted values. The proposed model is designed and implemented to achieve high accuracy for failure prediction, regardless of whether the model uses a large or small trace size. The evaluation results show that our proposed model achieved a high precision, recall and f1 score.\",\"PeriodicalId\":384949,\"journal\":{\"name\":\"2021 IEEE International Systems Conference (SysCon)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Systems Conference (SysCon)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SysCon48628.2021.9447141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Systems Conference (SysCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SysCon48628.2021.9447141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Failure Prediction Model for Large Scale Cloud Applications using Deep Learning
Many cloud service providers face significant challenges in preventing hardware and software failure from occurring. Due to the large scale and heterogeneous nature of cloud computing, cloud services continue to experience failures in their components. A significant proportion of previous studies have focused on the characterization of failed jobs and understanding their behavior, while a few studies have focused on failure prediction, with a focus on increasing the accuracy of failure prediction models. This paper presents the development and implementation of a failure prediction model using a deep learning approach. The proposed model can identify and detect failed tasks early on before they occur. The key feature of the failure prediction model is to improve the performance of cloud applications by reducing the number of failed jobs. In order to investigate the behavior of failure and apply the prediction of failure to the large-scale environment, we used three different traces, namely Google Cluster Trace, Mustang and Trinity. Moreover, we have evaluated the proposed model performance using different evaluation metrics to ensure that the proposed model provides the highest accuracy of predicted values. The proposed model is designed and implemented to achieve high accuracy for failure prediction, regardless of whether the model uses a large or small trace size. The evaluation results show that our proposed model achieved a high precision, recall and f1 score.