Gaurav Chaudhary, Derssie Mebratu, Bryan Lewis, Rahul Khanna, Jun Jin, Mohammad Hossain
{"title":"Monitoring Workload Performance in Noisy Neighborhoods Using Performance Monitoring Units","authors":"Gaurav Chaudhary, Derssie Mebratu, Bryan Lewis, Rahul Khanna, Jun Jin, Mohammad Hossain","doi":"10.1109/AIOps59134.2023.00007","DOIUrl":"https://doi.org/10.1109/AIOps59134.2023.00007","url":null,"abstract":"Cloud service providers often overbook the data centers to utilize the compute resource maximally. This often involves compute resource sharing between different containerized workloads. The unpredictability and lack of knowledge about the co-tenant workloads can often lead to scenarios where multiple workloads compete for limited shared resources. Such scenarios are often accompanied by performance degradation of some workloads when a co-tenant workload, a.k.a. noisy neighbor, dominates the utilization of one or multiple shared resources, and hence negatively affects other workloads, and influences the quality of service (QoS). This paper presents two approaches to detect workload performance degradation when subjected to a noisy neighbor. We use high dimensional performance data obtained from performance monitoring units (PMU) hardware build inside a processor to infer performance degradation. Our first approach uses a combination of feature selection, dimensionality reduction and Bayesian Gaussian mixture models to model the performance and infer the likelihood of abnormal performance on the new unseen data. In the second approach we use a subspace tracking technique to track the changing subspace of the high dimensional performance data to infer the changing workload performance. Both the algorithms have an offline computationally intensive part but are light weight when used for performance prediction on new data. This offers a way for an almost real time tracking of application performance and opens up possibilities for real time optimization of workload performance.","PeriodicalId":427858,"journal":{"name":"2023 IEEE/ACM International Workshop on Cloud Intelligence & AIOps (AIOps)","volume":"27 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133043102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Salman Ahmed, Muskaan Singh, Brendan Doherty, E. Ramlan, Kathryn Harkin, M. Bucholc, Damien Coyle
{"title":"Knowledge-based Intelligent System for IT Incident DevOps","authors":"Salman Ahmed, Muskaan Singh, Brendan Doherty, E. Ramlan, Kathryn Harkin, M. Bucholc, Damien Coyle","doi":"10.1109/AIOps59134.2023.00005","DOIUrl":"https://doi.org/10.1109/AIOps59134.2023.00005","url":null,"abstract":"The automation of IT incident management (i.e., handling of any unusual events that hamper the quality of IT services) is a main focus in Artificial Intelligence for IT Operations (AIOPS). The success and reputation of large-scale firms depend on their customer service and helpdesk system. These systems tend to handle client requests and track customer service agent interactions. In this research, we present a complete knowledge-based system that automates two core components of IT incident service management (ITSM): (1) Ticket Assignment Group(TAG) and (2) Incident Resolution (IR). Our proposed system bypasses the 4 core steps of the traditional ITSM process, including data investigation, event correlation, situation room collaboration, and probable root cause. It provides immediate solutions that can save companies key performance indicator(KPIs) resources and reduce the mean time to resolution (MTTR). The experiment used an industrial, real-time ITSM dataset from a prominent IT organization comprising 500,000 real-time incident descriptions with encoded labels. Furthermore, our systems are then evaluated with an open-source dataset. Compared to the existing benchmark methodologies, there is a 5 % improvement in terms of Accuracy score. The study demonstrates AI automation capabilities in incident handling (TAG and IR) for large real- world IT systems.","PeriodicalId":427858,"journal":{"name":"2023 IEEE/ACM International Workshop on Cloud Intelligence & AIOps (AIOps)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133328151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ali Kazemi Arani, Mansooreh Zahedi, T. H. Le, M. A. Babar
{"title":"SoK: Machine Learning for Continuous Integration","authors":"Ali Kazemi Arani, Mansooreh Zahedi, T. H. Le, M. A. Babar","doi":"10.1109/AIOps59134.2023.00006","DOIUrl":"https://doi.org/10.1109/AIOps59134.2023.00006","url":null,"abstract":"Continuous Integration (CI) has become a well- established software development practice for automatically and continuously integrating code changes during software development. An increasing number of Machine Learning (ML) based approaches for automation of CI phases are being reported in the literature. It is timely and relevant to provide a Systemization of Knowledge (SoK) of ML-based approaches for CI phases. This paper reports an SoK of different aspects of the use of ML for CI. Our systematic analysis also highlights the deficiencies of the existing ML-based solutions that can be improved for advancing the state-of-the-art.","PeriodicalId":427858,"journal":{"name":"2023 IEEE/ACM International Workshop on Cloud Intelligence & AIOps (AIOps)","volume":"53 Pt A 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133780722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}