基于深度学习的大规模云应用故障预测模型

Mohammad S. Jassas, Q. Mahmoud
{"title":"基于深度学习的大规模云应用故障预测模型","authors":"Mohammad S. Jassas, Q. Mahmoud","doi":"10.1109/SysCon48628.2021.9447141","DOIUrl":null,"url":null,"abstract":"Many cloud service providers face significant challenges in preventing hardware and software failure from occurring. Due to the large scale and heterogeneous nature of cloud computing, cloud services continue to experience failures in their components. A significant proportion of previous studies have focused on the characterization of failed jobs and understanding their behavior, while a few studies have focused on failure prediction, with a focus on increasing the accuracy of failure prediction models. This paper presents the development and implementation of a failure prediction model using a deep learning approach. The proposed model can identify and detect failed tasks early on before they occur. The key feature of the failure prediction model is to improve the performance of cloud applications by reducing the number of failed jobs. In order to investigate the behavior of failure and apply the prediction of failure to the large-scale environment, we used three different traces, namely Google Cluster Trace, Mustang and Trinity. Moreover, we have evaluated the proposed model performance using different evaluation metrics to ensure that the proposed model provides the highest accuracy of predicted values. The proposed model is designed and implemented to achieve high accuracy for failure prediction, regardless of whether the model uses a large or small trace size. The evaluation results show that our proposed model achieved a high precision, recall and f1 score.","PeriodicalId":384949,"journal":{"name":"2021 IEEE International Systems Conference (SysCon)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Failure Prediction Model for Large Scale Cloud Applications using Deep Learning\",\"authors\":\"Mohammad S. Jassas, Q. Mahmoud\",\"doi\":\"10.1109/SysCon48628.2021.9447141\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many cloud service providers face significant challenges in preventing hardware and software failure from occurring. Due to the large scale and heterogeneous nature of cloud computing, cloud services continue to experience failures in their components. A significant proportion of previous studies have focused on the characterization of failed jobs and understanding their behavior, while a few studies have focused on failure prediction, with a focus on increasing the accuracy of failure prediction models. This paper presents the development and implementation of a failure prediction model using a deep learning approach. The proposed model can identify and detect failed tasks early on before they occur. The key feature of the failure prediction model is to improve the performance of cloud applications by reducing the number of failed jobs. In order to investigate the behavior of failure and apply the prediction of failure to the large-scale environment, we used three different traces, namely Google Cluster Trace, Mustang and Trinity. Moreover, we have evaluated the proposed model performance using different evaluation metrics to ensure that the proposed model provides the highest accuracy of predicted values. The proposed model is designed and implemented to achieve high accuracy for failure prediction, regardless of whether the model uses a large or small trace size. The evaluation results show that our proposed model achieved a high precision, recall and f1 score.\",\"PeriodicalId\":384949,\"journal\":{\"name\":\"2021 IEEE International Systems Conference (SysCon)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Systems Conference (SysCon)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SysCon48628.2021.9447141\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Systems Conference (SysCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SysCon48628.2021.9447141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

许多云服务提供商在防止硬件和软件故障发生方面面临重大挑战。由于云计算的大规模和异构性质,云服务在其组件中不断遇到故障。以往的研究主要集中在失败作业的特征和对其行为的理解上,而少数研究则集中在失效预测上,重点是提高失效预测模型的准确性。本文介绍了使用深度学习方法的故障预测模型的开发和实现。所提出的模型可以在失败任务发生之前及早识别和检测它们。故障预测模型的关键特征是通过减少失败作业的数量来提高云应用程序的性能。为了研究故障行为并将故障预测应用于大规模环境,我们使用了三种不同的跟踪,即谷歌Cluster Trace、Mustang和Trinity。此外,我们使用不同的评估指标评估了所提出的模型的性能,以确保所提出的模型提供预测值的最高精度。该模型的设计和实现,无论模型使用的轨迹尺寸是大还是小,都能达到较高的故障预测精度。评价结果表明,该模型具有较高的准确率、召回率和f1分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Failure Prediction Model for Large Scale Cloud Applications using Deep Learning
Many cloud service providers face significant challenges in preventing hardware and software failure from occurring. Due to the large scale and heterogeneous nature of cloud computing, cloud services continue to experience failures in their components. A significant proportion of previous studies have focused on the characterization of failed jobs and understanding their behavior, while a few studies have focused on failure prediction, with a focus on increasing the accuracy of failure prediction models. This paper presents the development and implementation of a failure prediction model using a deep learning approach. The proposed model can identify and detect failed tasks early on before they occur. The key feature of the failure prediction model is to improve the performance of cloud applications by reducing the number of failed jobs. In order to investigate the behavior of failure and apply the prediction of failure to the large-scale environment, we used three different traces, namely Google Cluster Trace, Mustang and Trinity. Moreover, we have evaluated the proposed model performance using different evaluation metrics to ensure that the proposed model provides the highest accuracy of predicted values. The proposed model is designed and implemented to achieve high accuracy for failure prediction, regardless of whether the model uses a large or small trace size. The evaluation results show that our proposed model achieved a high precision, recall and f1 score.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信