用于微服务分布式遥测的可扩展轻量级框架

IF 5.7 3区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Sustainable Computing-Informatics & Systems Pub Date : 2025-02-26 DOI:10.1016/j.suscom.2025.101100

Manuel Otero , José María García , Pablo Fernandez

{"title":"用于微服务分布式遥测的可扩展轻量级框架","authors":"Manuel Otero , José María García , Pablo Fernandez","doi":"10.1016/j.suscom.2025.101100","DOIUrl":null,"url":null,"abstract":"<div><div>Microservice architectures have become the standard for developing scalable distributed systems that offer significant benefits in managing the integration and evolution of complex applications. However, they face challenges in effectively diagnosing and resolving performance and reliability issues. Traditional centralized telemetry models and cloud-based monitoring platforms often require complex or costly configurations and are not optimized for RESTful microservices. In fact, although the OpenAPI Specification (OAS) has become a key standard for describing microservice APIs, existing telemetry tools do not leverage this information to enhance service analysis and diagnostics. This paper introduces a lightweight and distributed approach to telemetry that uses OAS-based API information, offering an automated, configuration-free system that enables developers and operations teams to perform root cause analysis more efficiently. Moreover, we propose a plugin system to incorporate intelligent behavior into the telemetry system, such as an adaptive proactive alert mechanism when response-time anomalies are detected. By incorporating this extensibility mechanism, the framework paves the way to address issues such as energy consumption and performance, allowing the system to dynamically adjust its monitoring activities to optimize resource usage and minimize the carbon footprint of microservice deployment and execution. This adaptability reduces operational overhead and supports sustainable computing practices. To validate our approach, we present a proof-of-concept in the form of a ready-to-use package for the NodeJS ecosystem, demonstrating that this distributed telemetry model can operate with minimal impact on system performance and resource usage, proving its effectiveness to support more robust and sustainable IT systems.</div></div>","PeriodicalId":48686,"journal":{"name":"Sustainable Computing-Informatics & Systems","volume":"46 ","pages":"Article 101100"},"PeriodicalIF":5.7000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An extensible lightweight framework for distributed telemetry of microservices\",\"authors\":\"Manuel Otero , José María García , Pablo Fernandez\",\"doi\":\"10.1016/j.suscom.2025.101100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Microservice architectures have become the standard for developing scalable distributed systems that offer significant benefits in managing the integration and evolution of complex applications. However, they face challenges in effectively diagnosing and resolving performance and reliability issues. Traditional centralized telemetry models and cloud-based monitoring platforms often require complex or costly configurations and are not optimized for RESTful microservices. In fact, although the OpenAPI Specification (OAS) has become a key standard for describing microservice APIs, existing telemetry tools do not leverage this information to enhance service analysis and diagnostics. This paper introduces a lightweight and distributed approach to telemetry that uses OAS-based API information, offering an automated, configuration-free system that enables developers and operations teams to perform root cause analysis more efficiently. Moreover, we propose a plugin system to incorporate intelligent behavior into the telemetry system, such as an adaptive proactive alert mechanism when response-time anomalies are detected. By incorporating this extensibility mechanism, the framework paves the way to address issues such as energy consumption and performance, allowing the system to dynamically adjust its monitoring activities to optimize resource usage and minimize the carbon footprint of microservice deployment and execution. This adaptability reduces operational overhead and supports sustainable computing practices. To validate our approach, we present a proof-of-concept in the form of a ready-to-use package for the NodeJS ecosystem, demonstrating that this distributed telemetry model can operate with minimal impact on system performance and resource usage, proving its effectiveness to support more robust and sustainable IT systems.</div></div>\",\"PeriodicalId\":48686,\"journal\":{\"name\":\"Sustainable Computing-Informatics & Systems\",\"volume\":\"46 \",\"pages\":\"Article 101100\"},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2025-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sustainable Computing-Informatics & Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2210537925000204\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sustainable Computing-Informatics & Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2210537925000204","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

微服务架构已经成为开发可扩展分布式系统的标准，它在管理复杂应用程序的集成和演变方面提供了显著的好处。然而，它们在有效诊断和解决性能和可靠性问题方面面临挑战。传统的集中式遥测模型和基于云的监控平台通常需要复杂或昂贵的配置，并且没有针对RESTful微服务进行优化。事实上，尽管OpenAPI规范（OAS）已经成为描述微服务api的关键标准，但现有的遥测工具并没有利用这些信息来增强服务分析和诊断。本文介绍了一种轻量级的分布式遥测方法，该方法使用基于oas的API信息，提供了一个自动化的、无需配置的系统，使开发人员和运营团队能够更有效地执行根本原因分析。此外，我们提出了一个插件系统，将智能行为纳入遥测系统，例如当检测到响应时间异常时的自适应主动警报机制。通过合并这种可扩展性机制，框架为解决诸如能源消耗和性能等问题铺平了道路，允许系统动态调整其监视活动以优化资源使用并最小化微服务部署和执行的碳足迹。这种适应性降低了操作开销，并支持可持续的计算实践。为了验证我们的方法，我们为NodeJS生态系统提供了一个现成的概念验证包，证明了这种分布式遥测模型可以在对系统性能和资源使用影响最小的情况下运行，证明了其支持更强大和可持续的IT系统的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An extensible lightweight framework for distributed telemetry of microservices

Microservice architectures have become the standard for developing scalable distributed systems that offer significant benefits in managing the integration and evolution of complex applications. However, they face challenges in effectively diagnosing and resolving performance and reliability issues. Traditional centralized telemetry models and cloud-based monitoring platforms often require complex or costly configurations and are not optimized for RESTful microservices. In fact, although the OpenAPI Specification (OAS) has become a key standard for describing microservice APIs, existing telemetry tools do not leverage this information to enhance service analysis and diagnostics. This paper introduces a lightweight and distributed approach to telemetry that uses OAS-based API information, offering an automated, configuration-free system that enables developers and operations teams to perform root cause analysis more efficiently. Moreover, we propose a plugin system to incorporate intelligent behavior into the telemetry system, such as an adaptive proactive alert mechanism when response-time anomalies are detected. By incorporating this extensibility mechanism, the framework paves the way to address issues such as energy consumption and performance, allowing the system to dynamically adjust its monitoring activities to optimize resource usage and minimize the carbon footprint of microservice deployment and execution. This adaptability reduces operational overhead and supports sustainable computing practices. To validate our approach, we present a proof-of-concept in the form of a ready-to-use package for the NodeJS ecosystem, demonstrating that this distributed telemetry model can operate with minimal impact on system performance and resource usage, proving its effectiveness to support more robust and sustainable IT systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sustainable Computing-Informatics & Systems COMPUTER SCIENCE, HARDWARE & ARCHITECTUREC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

10.70

自引率

4.40%

发文量

142

期刊介绍： Sustainable computing is a rapidly expanding research area spanning the fields of computer science and engineering, electrical engineering as well as other engineering disciplines. The aim of Sustainable Computing: Informatics and Systems (SUSCOM) is to publish the myriad research findings related to energy-aware and thermal-aware management of computing resource. Equally important is a spectrum of related research issues such as applications of computing that can have ecological and societal impacts. SUSCOM publishes original and timely research papers and survey articles in current areas of power, energy, temperature, and environment related research areas of current importance to readers. SUSCOM has an editorial board comprising prominent researchers from around the world and selects competitively evaluated peer-reviewed papers.