REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems

2009 29th IEEE International Conference on Distributed Computing Systems Pub Date : 2009-06-22 DOI:10.1109/ICDCS.2009.15

S. Meng, Srinivas R. Kashyap, C. Venkatramani, Ling Liu

{"title":"REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems","authors":"S. Meng, Srinivas R. Kashyap, C. Venkatramani, Ling Liu","doi":"10.1109/ICDCS.2009.15","DOIUrl":null,"url":null,"abstract":"To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of distributed system and application states. This results in application state monitoring tasks that require fine-grained attribute information to be collected from relevant nodes efficiently. Existing approaches either treat multiple application state monitoring tasks independently and build ad-hoc monitoring trees for each task, or construct a single static monitoring tree for multiple tasks. We argue that a careful planning of multiple application state monitoring tasks by jointly considering multi-task optimization and node level resource constraints can provide significant gains in performance and scalability. In this paper, we present REMO, a REsource-aware application state MOnitoring system. REMO produces a forest of optimized monitoring trees through iterations of two phases, one phase exploring cost sharing opportunities via estimation and the other refining the monitoring plan through resource-sensitive tree construction. Our experimental results include those gathered by deploying REMO on a BlueGene/P rack running IBM's large-scale distributed streaming system - System S. Using REMO running over 200 monitoring tasks for an application deployed across 200 nodes results in a 35%-45% decrease in the percentage error of collected attributes compared to existing schemes.","PeriodicalId":387968,"journal":{"name":"2009 29th IEEE International Conference on Distributed Computing Systems","volume":"9 33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 29th IEEE International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2009.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of distributed system and application states. This results in application state monitoring tasks that require fine-grained attribute information to be collected from relevant nodes efficiently. Existing approaches either treat multiple application state monitoring tasks independently and build ad-hoc monitoring trees for each task, or construct a single static monitoring tree for multiple tasks. We argue that a careful planning of multiple application state monitoring tasks by jointly considering multi-task optimization and node level resource constraints can provide significant gains in performance and scalability. In this paper, we present REMO, a REsource-aware application state MOnitoring system. REMO produces a forest of optimized monitoring trees through iterations of two phases, one phase exploring cost sharing opportunities via estimation and the other refining the monitoring plan through resource-sensitive tree construction. Our experimental results include those gathered by deploying REMO on a BlueGene/P rack running IBM's large-scale distributed streaming system - System S. Using REMO running over 200 monitoring tasks for an application deployed across 200 nodes results in a 35%-45% decrease in the percentage error of collected attributes compared to existing schemes.

查看原文本刊更多论文

大规模分布式系统的资源感知应用状态监控

为了观察、分析和控制大型分布式系统及其上托管的应用程序，对分布式系统的性能属性和应用程序状态进行持续监控的需求越来越大。这导致应用程序状态监视任务需要有效地从相关节点收集细粒度属性信息。现有的方法要么独立地处理多个应用程序状态监视任务，并为每个任务构建专门的监视树，要么为多个任务构建单个静态监视树。我们认为，通过联合考虑多任务优化和节点级资源约束来仔细规划多个应用程序状态监控任务，可以在性能和可伸缩性方面获得显着的收益。本文提出了一种基于资源感知的应用状态监控系统——REMO。REMO通过两个阶段的迭代生成一个优化的监测树森林，一个阶段通过估算探索成本共享机会，另一个阶段通过资源敏感树构建来细化监测计划。我们的实验结果包括在运行IBM大规模分布式流系统system s的BlueGene/P机架上部署REMO收集的结果。使用REMO为部署在200个节点上的应用程序运行超过200个监控任务，与现有方案相比，收集属性的百分比误差降低了35%-45%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 29th IEEE International Conference on Distributed Computing Systems

自引率

0.00%

发文量