基于超参数调优MLP的云环境下作业失效预测机器学习框架

K. Vani, S. Sujatha
{"title":"基于超参数调优MLP的云环境下作业失效预测机器学习框架","authors":"K. Vani, S. Sujatha","doi":"10.1109/ICATIECE56365.2022.10047809","DOIUrl":null,"url":null,"abstract":"Failures are inevitable in cloud computing systems (CCS) because of their enormous size and complexity, which results in reliability and efficiency losses. It is possible to take actions to increase the reliability and effectiveness of CCS through failure mitigation, fault tolerance, and recovery. Failure prediction and risk identification approaches could forecast such failure occurrences using data gathered during CCS operation. In order to handle the present state-of-the-art evolving computing systems, standard runtime fault-tolerance (FT) solutions like data replication and periodic check-pointing are not very successful. This has made it essential to have a robust method with a thorough knowledge of component and system failures as well as the capability to accurately predict probable system failures in the future. In this research, we develop a paradigm for improving the reliability and efficiency of cloud environment by risk assessment. This study starts with analyzing the failure task behavior and their related operational information. A predictive model is built using cat boost, genetic algorithm and hyper parameter tuned multilayer perceptron for finding the feature importance, selecting the most relevant features and to predict the high risk cloud tasks respectively. The present method is evaluated on Google Custer data with necessary performance metrics and compared with other machine learning approaches.","PeriodicalId":199942,"journal":{"name":"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Machine Learning Framework for Job Failure Prediction in Cloud using Hyper Parameter Tuned MLP\",\"authors\":\"K. Vani, S. Sujatha\",\"doi\":\"10.1109/ICATIECE56365.2022.10047809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Failures are inevitable in cloud computing systems (CCS) because of their enormous size and complexity, which results in reliability and efficiency losses. It is possible to take actions to increase the reliability and effectiveness of CCS through failure mitigation, fault tolerance, and recovery. Failure prediction and risk identification approaches could forecast such failure occurrences using data gathered during CCS operation. In order to handle the present state-of-the-art evolving computing systems, standard runtime fault-tolerance (FT) solutions like data replication and periodic check-pointing are not very successful. This has made it essential to have a robust method with a thorough knowledge of component and system failures as well as the capability to accurately predict probable system failures in the future. In this research, we develop a paradigm for improving the reliability and efficiency of cloud environment by risk assessment. This study starts with analyzing the failure task behavior and their related operational information. A predictive model is built using cat boost, genetic algorithm and hyper parameter tuned multilayer perceptron for finding the feature importance, selecting the most relevant features and to predict the high risk cloud tasks respectively. The present method is evaluated on Google Custer data with necessary performance metrics and compared with other machine learning approaches.\",\"PeriodicalId\":199942,\"journal\":{\"name\":\"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICATIECE56365.2022.10047809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICATIECE56365.2022.10047809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

云计算系统由于其庞大的规模和复杂性,不可避免地会出现故障,从而导致可靠性和效率的损失。可以采取措施,通过故障缓解、容错和恢复来提高CCS的可靠性和有效性。故障预测和风险识别方法可以利用CCS运行期间收集的数据预测此类故障的发生。为了处理当前最先进的不断发展的计算系统,标准的运行时容错(FT)解决方案(如数据复制和定期检查点)不是很成功。这就需要一种强大的方法,对组件和系统故障有全面的了解,并能够准确预测未来可能出现的系统故障。在本研究中,我们开发了一种通过风险评估来提高云环境可靠性和效率的范例。本研究首先分析失效任务行为及其相关操作信息。利用cat boost、遗传算法和超参数调优多层感知器分别建立了预测模型,用于发现特征重要性、选择最相关特征和预测高风险云任务。本方法在Google Custer数据上进行了必要的性能指标评估,并与其他机器学习方法进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Machine Learning Framework for Job Failure Prediction in Cloud using Hyper Parameter Tuned MLP
Failures are inevitable in cloud computing systems (CCS) because of their enormous size and complexity, which results in reliability and efficiency losses. It is possible to take actions to increase the reliability and effectiveness of CCS through failure mitigation, fault tolerance, and recovery. Failure prediction and risk identification approaches could forecast such failure occurrences using data gathered during CCS operation. In order to handle the present state-of-the-art evolving computing systems, standard runtime fault-tolerance (FT) solutions like data replication and periodic check-pointing are not very successful. This has made it essential to have a robust method with a thorough knowledge of component and system failures as well as the capability to accurately predict probable system failures in the future. In this research, we develop a paradigm for improving the reliability and efficiency of cloud environment by risk assessment. This study starts with analyzing the failure task behavior and their related operational information. A predictive model is built using cat boost, genetic algorithm and hyper parameter tuned multilayer perceptron for finding the feature importance, selecting the most relevant features and to predict the high risk cloud tasks respectively. The present method is evaluated on Google Custer data with necessary performance metrics and compared with other machine learning approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信