Achieving Robust Self-Management for Large-Scale Distributed Applications

A. Al-Shishtawy, Muhammad Fayyaz, K. Popov, Vladimir Vlassov
{"title":"Achieving Robust Self-Management for Large-Scale Distributed Applications","authors":"A. Al-Shishtawy, Muhammad Fayyaz, K. Popov, Vladimir Vlassov","doi":"10.1109/SASO.2010.42","DOIUrl":null,"url":null,"abstract":"Achieving self-management can be challenging, particularly in dynamic environments with resource churn (joins/leaves/failures). Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of robust management elements (RMEs), which are able to heal themselves under continuous churn. Using RMEs allows the developer to separate the issue of dealing with the effect of churn on management from the management logic. This facilitates the development of robust management by making the developer focus on managing the application while relying on the platform to provide the robustness of management. RMEs can be implemented as fault-tolerant long-living services. We present a generic approach and an associated algorithm to achieve fault-tolerant long-living services. Our approach is based on replicating a service using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. The algorithm uses P2P replica placement schemes to place replicas and uses the P2P overlay to monitor them. The replicated state machine is extended to analyze monitoring data in order to decide on when and where to migrate. We describe how to use our approach to achieve robust management elements. We present a simulation-based evaluation of our approach which shows its feasibility.","PeriodicalId":370044,"journal":{"name":"2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASO.2010.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

Achieving self-management can be challenging, particularly in dynamic environments with resource churn (joins/leaves/failures). Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of robust management elements (RMEs), which are able to heal themselves under continuous churn. Using RMEs allows the developer to separate the issue of dealing with the effect of churn on management from the management logic. This facilitates the development of robust management by making the developer focus on managing the application while relying on the platform to provide the robustness of management. RMEs can be implemented as fault-tolerant long-living services. We present a generic approach and an associated algorithm to achieve fault-tolerant long-living services. Our approach is based on replicating a service using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. The algorithm uses P2P replica placement schemes to place replicas and uses the P2P overlay to monitor them. The replicated state machine is extended to analyze monitoring data in order to decide on when and where to migrate. We describe how to use our approach to achieve robust management elements. We present a simulation-based evaluation of our approach which shows its feasibility.
实现大规模分布式应用的鲁棒自我管理
实现自我管理可能是具有挑战性的,特别是在具有资源变动(连接/离开/失败)的动态环境中。处理混乱对管理的影响增加了管理逻辑的复杂性,从而使其开发耗时且容易出错。我们提出健壮管理元素(RMEs)的抽象,它能够在持续的混乱中自我修复。使用RMEs允许开发人员将处理流失对管理的影响的问题从管理逻辑中分离出来。这使得开发人员专注于管理应用程序,同时依靠平台提供管理的健壮性,从而促进了健壮管理的开发。rme可以实现为容错的长期服务。我们提出了一种通用的方法和相关的算法来实现容错的长寿命服务。我们的方法基于使用具有可重构副本集的有限状态机复制来复制服务。我们的算法自动化了副本集的重新配置(迁移),以容忍持续的混乱。该算法采用P2P副本放置方案来放置副本,并使用P2P覆盖对副本进行监控。复制的状态机被扩展为分析监视数据,以便决定迁移的时间和地点。我们描述了如何使用我们的方法来实现健壮的管理元素。我们提出了一个基于模拟的方法评估,证明了它的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信