Prateek Bagora, Amin Ebrahimzadeh, F. Wuhib, R. Glitho
{"title":"Data Labeling for Fault Detection in Cloud: A Test Suite-Based Active Learning Approach","authors":"Prateek Bagora, Amin Ebrahimzadeh, F. Wuhib, R. Glitho","doi":"10.1109/NetSoft57336.2023.10175492","DOIUrl":null,"url":null,"abstract":"Ensuring the quality of service of applications deployed in inherently complex and fault-prone cloud environments is of utmost concern. While machine learning based fault management solutions help attain the desired reliability, they require labeled cloud metrics data for training and evaluation. Furthermore, high dynamicity of cloud environments brings forth emerging data distributions, which necessitate frequent labeling of data for model adaptation. We propose a test suite-based active learning framework for automated labeling of cloud metrics data with the corresponding cloud system state while accounting for emerging fault patterns and data or concept drifts. We have implemented our solution on a cloud testbed and introduced various emerging data distribution scenarios to evaluate the proposed framework’s labeling efficacy over known and emerging data distributions. According to our results, the proposed framework achieves a 41% higher weighted Fl-score and a 34% higher average AUC score than a system without any adaptation for emerging data distributions.","PeriodicalId":223208,"journal":{"name":"2023 IEEE 9th International Conference on Network Softwarization (NetSoft)","volume":"236 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 9th International Conference on Network Softwarization (NetSoft)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NetSoft57336.2023.10175492","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Ensuring the quality of service of applications deployed in inherently complex and fault-prone cloud environments is of utmost concern. While machine learning based fault management solutions help attain the desired reliability, they require labeled cloud metrics data for training and evaluation. Furthermore, high dynamicity of cloud environments brings forth emerging data distributions, which necessitate frequent labeling of data for model adaptation. We propose a test suite-based active learning framework for automated labeling of cloud metrics data with the corresponding cloud system state while accounting for emerging fault patterns and data or concept drifts. We have implemented our solution on a cloud testbed and introduced various emerging data distribution scenarios to evaluate the proposed framework’s labeling efficacy over known and emerging data distributions. According to our results, the proposed framework achieves a 41% higher weighted Fl-score and a 34% higher average AUC score than a system without any adaptation for emerging data distributions.