{"title":"基于超参数调优MLP的云环境下作业失效预测机器学习框架","authors":"K. Vani, S. Sujatha","doi":"10.1109/ICATIECE56365.2022.10047809","DOIUrl":null,"url":null,"abstract":"Failures are inevitable in cloud computing systems (CCS) because of their enormous size and complexity, which results in reliability and efficiency losses. It is possible to take actions to increase the reliability and effectiveness of CCS through failure mitigation, fault tolerance, and recovery. Failure prediction and risk identification approaches could forecast such failure occurrences using data gathered during CCS operation. In order to handle the present state-of-the-art evolving computing systems, standard runtime fault-tolerance (FT) solutions like data replication and periodic check-pointing are not very successful. This has made it essential to have a robust method with a thorough knowledge of component and system failures as well as the capability to accurately predict probable system failures in the future. In this research, we develop a paradigm for improving the reliability and efficiency of cloud environment by risk assessment. This study starts with analyzing the failure task behavior and their related operational information. A predictive model is built using cat boost, genetic algorithm and hyper parameter tuned multilayer perceptron for finding the feature importance, selecting the most relevant features and to predict the high risk cloud tasks respectively. The present method is evaluated on Google Custer data with necessary performance metrics and compared with other machine learning approaches.","PeriodicalId":199942,"journal":{"name":"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Machine Learning Framework for Job Failure Prediction in Cloud using Hyper Parameter Tuned MLP\",\"authors\":\"K. Vani, S. Sujatha\",\"doi\":\"10.1109/ICATIECE56365.2022.10047809\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Failures are inevitable in cloud computing systems (CCS) because of their enormous size and complexity, which results in reliability and efficiency losses. It is possible to take actions to increase the reliability and effectiveness of CCS through failure mitigation, fault tolerance, and recovery. Failure prediction and risk identification approaches could forecast such failure occurrences using data gathered during CCS operation. In order to handle the present state-of-the-art evolving computing systems, standard runtime fault-tolerance (FT) solutions like data replication and periodic check-pointing are not very successful. This has made it essential to have a robust method with a thorough knowledge of component and system failures as well as the capability to accurately predict probable system failures in the future. In this research, we develop a paradigm for improving the reliability and efficiency of cloud environment by risk assessment. This study starts with analyzing the failure task behavior and their related operational information. A predictive model is built using cat boost, genetic algorithm and hyper parameter tuned multilayer perceptron for finding the feature importance, selecting the most relevant features and to predict the high risk cloud tasks respectively. The present method is evaluated on Google Custer data with necessary performance metrics and compared with other machine learning approaches.\",\"PeriodicalId\":199942,\"journal\":{\"name\":\"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICATIECE56365.2022.10047809\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICATIECE56365.2022.10047809","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Machine Learning Framework for Job Failure Prediction in Cloud using Hyper Parameter Tuned MLP
Failures are inevitable in cloud computing systems (CCS) because of their enormous size and complexity, which results in reliability and efficiency losses. It is possible to take actions to increase the reliability and effectiveness of CCS through failure mitigation, fault tolerance, and recovery. Failure prediction and risk identification approaches could forecast such failure occurrences using data gathered during CCS operation. In order to handle the present state-of-the-art evolving computing systems, standard runtime fault-tolerance (FT) solutions like data replication and periodic check-pointing are not very successful. This has made it essential to have a robust method with a thorough knowledge of component and system failures as well as the capability to accurately predict probable system failures in the future. In this research, we develop a paradigm for improving the reliability and efficiency of cloud environment by risk assessment. This study starts with analyzing the failure task behavior and their related operational information. A predictive model is built using cat boost, genetic algorithm and hyper parameter tuned multilayer perceptron for finding the feature importance, selecting the most relevant features and to predict the high risk cloud tasks respectively. The present method is evaluated on Google Custer data with necessary performance metrics and compared with other machine learning approaches.