Behshid Shayesteh;Chunyan Fu;Amin Ebrahimzadeh;Roch H. Glitho
{"title":"Data-Related Parameter Selection for Training Deep Learning Models Predicting Application Performance Degradation in Clouds","authors":"Behshid Shayesteh;Chunyan Fu;Amin Ebrahimzadeh;Roch H. Glitho","doi":"10.1109/TCC.2025.3570093","DOIUrl":null,"url":null,"abstract":"Applications deployed in clouds are susceptible to performance degradation due to diverse underlying causes such as infrastructure faults. To maintain the expected availability of these applications, Machine Learning (ML) models can be used to predict the impending application performance degradations to take preventive measures. However, the prediction accuracy of these ML models, which is a key indicator of their performance, is influenced by several factors, including training data size, data sampling intervals, input window and prediction horizon. To optimize these data-related parameters, in this article, we propose a surrogate-assisted multi-objective optimization algorithm with the objective to maximize prediction model accuracy while minimizing the resources consumed for data collection and storage. We evaluated the proposed algorithm through two use cases focusing on the prediction of Key Performance Indicators (KPIs) for a 5G core network and a web application deployed in two Kubernetes-based cloud testbeds. It is demonstrated that the proposed algorithm can achieve a normalized hypervolume of 99.5% relative to the optimal Pareto front and reduce search time for the optimal solution by 0.6 hours compared to other surrogates and by 3.58 hours compared to using no surrogates.","PeriodicalId":13202,"journal":{"name":"IEEE Transactions on Cloud Computing","volume":"13 3","pages":"794-806"},"PeriodicalIF":5.0000,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cloud Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11003803/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Applications deployed in clouds are susceptible to performance degradation due to diverse underlying causes such as infrastructure faults. To maintain the expected availability of these applications, Machine Learning (ML) models can be used to predict the impending application performance degradations to take preventive measures. However, the prediction accuracy of these ML models, which is a key indicator of their performance, is influenced by several factors, including training data size, data sampling intervals, input window and prediction horizon. To optimize these data-related parameters, in this article, we propose a surrogate-assisted multi-objective optimization algorithm with the objective to maximize prediction model accuracy while minimizing the resources consumed for data collection and storage. We evaluated the proposed algorithm through two use cases focusing on the prediction of Key Performance Indicators (KPIs) for a 5G core network and a web application deployed in two Kubernetes-based cloud testbeds. It is demonstrated that the proposed algorithm can achieve a normalized hypervolume of 99.5% relative to the optimal Pareto front and reduce search time for the optimal solution by 0.6 hours compared to other surrogates and by 3.58 hours compared to using no surrogates.
期刊介绍:
The IEEE Transactions on Cloud Computing (TCC) is dedicated to the multidisciplinary field of cloud computing. It is committed to the publication of articles that present innovative research ideas, application results, and case studies in cloud computing, focusing on key technical issues related to theory, algorithms, systems, applications, and performance.