Gaurav Chaudhary, Derssie Mebratu, Bryan Lewis, Rahul Khanna, Jun Jin, Mohammad Hossain
{"title":"Monitoring Workload Performance in Noisy Neighborhoods Using Performance Monitoring Units","authors":"Gaurav Chaudhary, Derssie Mebratu, Bryan Lewis, Rahul Khanna, Jun Jin, Mohammad Hossain","doi":"10.1109/AIOps59134.2023.00007","DOIUrl":null,"url":null,"abstract":"Cloud service providers often overbook the data centers to utilize the compute resource maximally. This often involves compute resource sharing between different containerized workloads. The unpredictability and lack of knowledge about the co-tenant workloads can often lead to scenarios where multiple workloads compete for limited shared resources. Such scenarios are often accompanied by performance degradation of some workloads when a co-tenant workload, a.k.a. noisy neighbor, dominates the utilization of one or multiple shared resources, and hence negatively affects other workloads, and influences the quality of service (QoS). This paper presents two approaches to detect workload performance degradation when subjected to a noisy neighbor. We use high dimensional performance data obtained from performance monitoring units (PMU) hardware build inside a processor to infer performance degradation. Our first approach uses a combination of feature selection, dimensionality reduction and Bayesian Gaussian mixture models to model the performance and infer the likelihood of abnormal performance on the new unseen data. In the second approach we use a subspace tracking technique to track the changing subspace of the high dimensional performance data to infer the changing workload performance. Both the algorithms have an offline computationally intensive part but are light weight when used for performance prediction on new data. This offers a way for an almost real time tracking of application performance and opens up possibilities for real time optimization of workload performance.","PeriodicalId":427858,"journal":{"name":"2023 IEEE/ACM International Workshop on Cloud Intelligence & AIOps (AIOps)","volume":"27 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM International Workshop on Cloud Intelligence & AIOps (AIOps)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AIOps59134.2023.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Cloud service providers often overbook the data centers to utilize the compute resource maximally. This often involves compute resource sharing between different containerized workloads. The unpredictability and lack of knowledge about the co-tenant workloads can often lead to scenarios where multiple workloads compete for limited shared resources. Such scenarios are often accompanied by performance degradation of some workloads when a co-tenant workload, a.k.a. noisy neighbor, dominates the utilization of one or multiple shared resources, and hence negatively affects other workloads, and influences the quality of service (QoS). This paper presents two approaches to detect workload performance degradation when subjected to a noisy neighbor. We use high dimensional performance data obtained from performance monitoring units (PMU) hardware build inside a processor to infer performance degradation. Our first approach uses a combination of feature selection, dimensionality reduction and Bayesian Gaussian mixture models to model the performance and infer the likelihood of abnormal performance on the new unseen data. In the second approach we use a subspace tracking technique to track the changing subspace of the high dimensional performance data to infer the changing workload performance. Both the algorithms have an offline computationally intensive part but are light weight when used for performance prediction on new data. This offers a way for an almost real time tracking of application performance and opens up possibilities for real time optimization of workload performance.