REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems

S. Meng, Srinivas R. Kashyap, C. Venkatramani, Ling Liu
{"title":"REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems","authors":"S. Meng, Srinivas R. Kashyap, C. Venkatramani, Ling Liu","doi":"10.1109/ICDCS.2009.15","DOIUrl":null,"url":null,"abstract":"To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of distributed system and application states. This results in application state monitoring tasks that require fine-grained attribute information to be collected from relevant nodes efficiently. Existing approaches either treat multiple application state monitoring tasks independently and build ad-hoc monitoring trees for each task, or construct a single static monitoring tree for multiple tasks. We argue that a careful planning of multiple application state monitoring tasks by jointly considering multi-task optimization and node level resource constraints can provide significant gains in performance and scalability. In this paper, we present REMO, a REsource-aware application state MOnitoring system. REMO produces a forest of optimized monitoring trees through iterations of two phases, one phase exploring cost sharing opportunities via estimation and the other refining the monitoring plan through resource-sensitive tree construction. Our experimental results include those gathered by deploying REMO on a BlueGene/P rack running IBM's large-scale distributed streaming system - System S. Using REMO running over 200 monitoring tasks for an application deployed across 200 nodes results in a 35%-45% decrease in the percentage error of collected attributes compared to existing schemes.","PeriodicalId":387968,"journal":{"name":"2009 29th IEEE International Conference on Distributed Computing Systems","volume":"9 33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 29th IEEE International Conference on Distributed Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2009.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

Abstract

To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of distributed system and application states. This results in application state monitoring tasks that require fine-grained attribute information to be collected from relevant nodes efficiently. Existing approaches either treat multiple application state monitoring tasks independently and build ad-hoc monitoring trees for each task, or construct a single static monitoring tree for multiple tasks. We argue that a careful planning of multiple application state monitoring tasks by jointly considering multi-task optimization and node level resource constraints can provide significant gains in performance and scalability. In this paper, we present REMO, a REsource-aware application state MOnitoring system. REMO produces a forest of optimized monitoring trees through iterations of two phases, one phase exploring cost sharing opportunities via estimation and the other refining the monitoring plan through resource-sensitive tree construction. Our experimental results include those gathered by deploying REMO on a BlueGene/P rack running IBM's large-scale distributed streaming system - System S. Using REMO running over 200 monitoring tasks for an application deployed across 200 nodes results in a 35%-45% decrease in the percentage error of collected attributes compared to existing schemes.
大规模分布式系统的资源感知应用状态监控
为了观察、分析和控制大型分布式系统及其上托管的应用程序,对分布式系统的性能属性和应用程序状态进行持续监控的需求越来越大。这导致应用程序状态监视任务需要有效地从相关节点收集细粒度属性信息。现有的方法要么独立地处理多个应用程序状态监视任务,并为每个任务构建专门的监视树,要么为多个任务构建单个静态监视树。我们认为,通过联合考虑多任务优化和节点级资源约束来仔细规划多个应用程序状态监控任务,可以在性能和可伸缩性方面获得显着的收益。本文提出了一种基于资源感知的应用状态监控系统——REMO。REMO通过两个阶段的迭代生成一个优化的监测树森林,一个阶段通过估算探索成本共享机会,另一个阶段通过资源敏感树构建来细化监测计划。我们的实验结果包括在运行IBM大规模分布式流系统system s的BlueGene/P机架上部署REMO收集的结果。使用REMO为部署在200个节点上的应用程序运行超过200个监控任务,与现有方案相比,收集属性的百分比误差降低了35%-45%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信