Panagiotis Giannakopoulos , Bart van Knippenberg , Kishor Chandra Joshi , Nicola Calabretta , George Exarchakos
{"title":"perfcorrelation:性能可变性相关框架","authors":"Panagiotis Giannakopoulos , Bart van Knippenberg , Kishor Chandra Joshi , Nicola Calabretta , George Exarchakos","doi":"10.1016/j.future.2025.107827","DOIUrl":null,"url":null,"abstract":"<div><div>Edge computing is a promising technology for deploying time-sensitive and privacy-sensitive applications closer to the premises of users. However, it is crucial to identify the sources of performance variability caused by application co-location to meet user requirements effectively. Monitoring systems typically expose hundreds of metrics, making comprehensive analysis challenging. As a result, researchers often rely on a small, arbitrarily selected subset of metrics for tasks such as building performance predictors. In this paper, we examine how the available monitoring metrics are correlated with Round Trip Time (RTT) fluctuations and suggest directions for building performance models. Our experiments focus on a Single Particle Analysis (SPA) applications for an electron microscopy use case, deployed in a Kubernetes environment and monitored by Prometheus. We demonstrate that while a subset of monitoring metrics consistently correlates with performance, the specific metrics in this subset can vary due to dynamic application co-locations and observation windows. Consequently, the optimal number of metrics and the choice of machine learning model needed to accurately capture performance variability vary between different scenarios (co-location and cluster nodes). These differences directly impact the effectiveness of scheduling decisions in resource clusters, which depend on performance predictors. Our work presents a method to systematically identify the most relevant monitoring metrics to changes in RTT and determining the most representative observation window, ensuring a more generalizable understanding of the performance of the application throughout its lifecycle.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"170 ","pages":"Article 107827"},"PeriodicalIF":6.2000,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"perfCorrelate: Performance variability correlation framework\",\"authors\":\"Panagiotis Giannakopoulos , Bart van Knippenberg , Kishor Chandra Joshi , Nicola Calabretta , George Exarchakos\",\"doi\":\"10.1016/j.future.2025.107827\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Edge computing is a promising technology for deploying time-sensitive and privacy-sensitive applications closer to the premises of users. However, it is crucial to identify the sources of performance variability caused by application co-location to meet user requirements effectively. Monitoring systems typically expose hundreds of metrics, making comprehensive analysis challenging. As a result, researchers often rely on a small, arbitrarily selected subset of metrics for tasks such as building performance predictors. In this paper, we examine how the available monitoring metrics are correlated with Round Trip Time (RTT) fluctuations and suggest directions for building performance models. Our experiments focus on a Single Particle Analysis (SPA) applications for an electron microscopy use case, deployed in a Kubernetes environment and monitored by Prometheus. We demonstrate that while a subset of monitoring metrics consistently correlates with performance, the specific metrics in this subset can vary due to dynamic application co-locations and observation windows. Consequently, the optimal number of metrics and the choice of machine learning model needed to accurately capture performance variability vary between different scenarios (co-location and cluster nodes). These differences directly impact the effectiveness of scheduling decisions in resource clusters, which depend on performance predictors. Our work presents a method to systematically identify the most relevant monitoring metrics to changes in RTT and determining the most representative observation window, ensuring a more generalizable understanding of the performance of the application throughout its lifecycle.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"170 \",\"pages\":\"Article 107827\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25001220\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25001220","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
Edge computing is a promising technology for deploying time-sensitive and privacy-sensitive applications closer to the premises of users. However, it is crucial to identify the sources of performance variability caused by application co-location to meet user requirements effectively. Monitoring systems typically expose hundreds of metrics, making comprehensive analysis challenging. As a result, researchers often rely on a small, arbitrarily selected subset of metrics for tasks such as building performance predictors. In this paper, we examine how the available monitoring metrics are correlated with Round Trip Time (RTT) fluctuations and suggest directions for building performance models. Our experiments focus on a Single Particle Analysis (SPA) applications for an electron microscopy use case, deployed in a Kubernetes environment and monitored by Prometheus. We demonstrate that while a subset of monitoring metrics consistently correlates with performance, the specific metrics in this subset can vary due to dynamic application co-locations and observation windows. Consequently, the optimal number of metrics and the choice of machine learning model needed to accurately capture performance variability vary between different scenarios (co-location and cluster nodes). These differences directly impact the effectiveness of scheduling decisions in resource clusters, which depend on performance predictors. Our work presents a method to systematically identify the most relevant monitoring metrics to changes in RTT and determining the most representative observation window, ensuring a more generalizable understanding of the performance of the application throughout its lifecycle.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.