Achieving Robust Self-Management for Large-Scale Distributed Applications

2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems Pub Date : 2010-09-27 DOI:10.1109/SASO.2010.42

A. Al-Shishtawy, Muhammad Fayyaz, K. Popov, Vladimir Vlassov

{"title":"Achieving Robust Self-Management for Large-Scale Distributed Applications","authors":"A. Al-Shishtawy, Muhammad Fayyaz, K. Popov, Vladimir Vlassov","doi":"10.1109/SASO.2010.42","DOIUrl":null,"url":null,"abstract":"Achieving self-management can be challenging, particularly in dynamic environments with resource churn (joins/leaves/failures). Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of robust management elements (RMEs), which are able to heal themselves under continuous churn. Using RMEs allows the developer to separate the issue of dealing with the effect of churn on management from the management logic. This facilitates the development of robust management by making the developer focus on managing the application while relying on the platform to provide the robustness of management. RMEs can be implemented as fault-tolerant long-living services. We present a generic approach and an associated algorithm to achieve fault-tolerant long-living services. Our approach is based on replicating a service using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. The algorithm uses P2P replica placement schemes to place replicas and uses the P2P overlay to monitor them. The replicated state machine is extended to analyze monitoring data in order to decide on when and where to migrate. We describe how to use our approach to achieve robust management elements. We present a simulation-based evaluation of our approach which shows its feasibility.","PeriodicalId":370044,"journal":{"name":"2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SASO.2010.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

Achieving self-management can be challenging, particularly in dynamic environments with resource churn (joins/leaves/failures). Dealing with the effect of churn on management increases the complexity of the management logic and thus makes its development time consuming and error prone. We propose the abstraction of robust management elements (RMEs), which are able to heal themselves under continuous churn. Using RMEs allows the developer to separate the issue of dealing with the effect of churn on management from the management logic. This facilitates the development of robust management by making the developer focus on managing the application while relying on the platform to provide the robustness of management. RMEs can be implemented as fault-tolerant long-living services. We present a generic approach and an associated algorithm to achieve fault-tolerant long-living services. Our approach is based on replicating a service using finite state machine replication with a reconfigurable replica set. Our algorithm automates the reconfiguration (migration) of the replica set in order to tolerate continuous churn. The algorithm uses P2P replica placement schemes to place replicas and uses the P2P overlay to monitor them. The replicated state machine is extended to analyze monitoring data in order to decide on when and where to migrate. We describe how to use our approach to achieve robust management elements. We present a simulation-based evaluation of our approach which shows its feasibility.

查看原文本刊更多论文

实现大规模分布式应用的鲁棒自我管理

实现自我管理可能是具有挑战性的，特别是在具有资源变动(连接/离开/失败)的动态环境中。处理混乱对管理的影响增加了管理逻辑的复杂性，从而使其开发耗时且容易出错。我们提出健壮管理元素(RMEs)的抽象，它能够在持续的混乱中自我修复。使用RMEs允许开发人员将处理流失对管理的影响的问题从管理逻辑中分离出来。这使得开发人员专注于管理应用程序，同时依靠平台提供管理的健壮性，从而促进了健壮管理的开发。rme可以实现为容错的长期服务。我们提出了一种通用的方法和相关的算法来实现容错的长寿命服务。我们的方法基于使用具有可重构副本集的有限状态机复制来复制服务。我们的算法自动化了副本集的重新配置(迁移)，以容忍持续的混乱。该算法采用P2P副本放置方案来放置副本，并使用P2P覆盖对副本进行监控。复制的状态机被扩展为分析监视数据，以便决定迁移的时间和地点。我们描述了如何使用我们的方法来实现健壮的管理元素。我们提出了一个基于模拟的方法评估，证明了它的可行性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems

自引率

0.00%

发文量