perfCorrelate: Performance variability correlation framework

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-04-03 DOI:10.1016/j.future.2025.107827

Panagiotis Giannakopoulos , Bart van Knippenberg , Kishor Chandra Joshi , Nicola Calabretta , George Exarchakos

{"title":"perfCorrelate: Performance variability correlation framework","authors":"Panagiotis Giannakopoulos , Bart van Knippenberg , Kishor Chandra Joshi , Nicola Calabretta , George Exarchakos","doi":"10.1016/j.future.2025.107827","DOIUrl":null,"url":null,"abstract":"<div><div>Edge computing is a promising technology for deploying time-sensitive and privacy-sensitive applications closer to the premises of users. However, it is crucial to identify the sources of performance variability caused by application co-location to meet user requirements effectively. Monitoring systems typically expose hundreds of metrics, making comprehensive analysis challenging. As a result, researchers often rely on a small, arbitrarily selected subset of metrics for tasks such as building performance predictors. In this paper, we examine how the available monitoring metrics are correlated with Round Trip Time (RTT) fluctuations and suggest directions for building performance models. Our experiments focus on a Single Particle Analysis (SPA) applications for an electron microscopy use case, deployed in a Kubernetes environment and monitored by Prometheus. We demonstrate that while a subset of monitoring metrics consistently correlates with performance, the specific metrics in this subset can vary due to dynamic application co-locations and observation windows. Consequently, the optimal number of metrics and the choice of machine learning model needed to accurately capture performance variability vary between different scenarios (co-location and cluster nodes). These differences directly impact the effectiveness of scheduling decisions in resource clusters, which depend on performance predictors. Our work presents a method to systematically identify the most relevant monitoring metrics to changes in RTT and determining the most representative observation window, ensuring a more generalizable understanding of the performance of the application throughout its lifecycle.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"170 ","pages":"Article 107827"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001220","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Edge computing is a promising technology for deploying time-sensitive and privacy-sensitive applications closer to the premises of users. However, it is crucial to identify the sources of performance variability caused by application co-location to meet user requirements effectively. Monitoring systems typically expose hundreds of metrics, making comprehensive analysis challenging. As a result, researchers often rely on a small, arbitrarily selected subset of metrics for tasks such as building performance predictors. In this paper, we examine how the available monitoring metrics are correlated with Round Trip Time (RTT) fluctuations and suggest directions for building performance models. Our experiments focus on a Single Particle Analysis (SPA) applications for an electron microscopy use case, deployed in a Kubernetes environment and monitored by Prometheus. We demonstrate that while a subset of monitoring metrics consistently correlates with performance, the specific metrics in this subset can vary due to dynamic application co-locations and observation windows. Consequently, the optimal number of metrics and the choice of machine learning model needed to accurately capture performance variability vary between different scenarios (co-location and cluster nodes). These differences directly impact the effectiveness of scheduling decisions in resource clusters, which depend on performance predictors. Our work presents a method to systematically identify the most relevant monitoring metrics to changes in RTT and determining the most representative observation window, ensuring a more generalizable understanding of the performance of the application throughout its lifecycle.

查看原文本刊更多论文

perfcorrelation：性能可变性相关框架

边缘计算是一种很有前途的技术，可以在更靠近用户场所的地方部署时间敏感和隐私敏感的应用程序。然而，为了有效地满足用户需求，识别由应用程序共定位引起的性能变化的来源是至关重要的。监视系统通常会暴露数百个指标，使全面分析变得具有挑战性。因此，研究人员经常依赖于一个小的、任意选择的指标子集来完成诸如构建性能预测器之类的任务。在本文中，我们研究了可用的监控指标如何与往返时间（RTT）波动相关，并提出了构建性能模型的方向。我们的实验集中在一个电子显微镜用例的单粒子分析（SPA）应用程序上，部署在Kubernetes环境中，由Prometheus监控。我们证明，虽然监视指标的子集始终与性能相关，但由于动态应用程序的共存和观察窗口，该子集中的特定指标可能会发生变化。因此，准确捕获性能可变性所需的指标的最佳数量和机器学习模型的选择在不同的场景（共定位和集群节点）之间有所不同。这些差异直接影响资源集群中调度决策的有效性，这取决于性能预测器。我们的工作提出了一种方法，可以系统地识别与RTT变化最相关的监控指标，并确定最具代表性的观察窗口，从而确保对应用程序在整个生命周期中的性能有更普遍的理解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.