Flexible failure handling for cooperative processes in distributed systems

2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing Pub Date : 2009-12-28 DOI:10.4108/ICST.COLLABORATECOM2009.8306

Artin Avanes, J. Freytag

{"title":"Flexible failure handling for cooperative processes in distributed systems","authors":"Artin Avanes, J. Freytag","doi":"10.4108/ICST.COLLABORATECOM2009.8306","DOIUrl":null,"url":null,"abstract":"Distributed systems will be increasingly built on top of wireless networks, such as sensor networks or hand-held devices with advanced sensing and computational abilities. Supporting cooperative processes executed by such unreliable and dynamic system components poses a various number of new technical challenges. In terms of recovery, limited resource capabilities have be considered during re-scheduling of failed process activities. In terms of concurrency, a non-blocking protocol is required to allow a high degree of parallelism. In this paper, we introduce a flexible and resource-oriented failure handling mechanism for cooperative processes in hierarchical and distributed systems. The objective is to ensure both - transactional semantics as well as the selection of suitable nodes with respect to available resource capabilities. Based on a nested execution model, we develop a multi-stage algorithm that uses constraint solving techniques in a parallel fashion thus achieving a more efficient recovery. We evaluate our proposed techniques in a prototype implementation, and demonstrate significant performance gains by using a parallel re-scheduling.","PeriodicalId":232795,"journal":{"name":"2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4108/ICST.COLLABORATECOM2009.8306","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Distributed systems will be increasingly built on top of wireless networks, such as sensor networks or hand-held devices with advanced sensing and computational abilities. Supporting cooperative processes executed by such unreliable and dynamic system components poses a various number of new technical challenges. In terms of recovery, limited resource capabilities have be considered during re-scheduling of failed process activities. In terms of concurrency, a non-blocking protocol is required to allow a high degree of parallelism. In this paper, we introduce a flexible and resource-oriented failure handling mechanism for cooperative processes in hierarchical and distributed systems. The objective is to ensure both - transactional semantics as well as the selection of suitable nodes with respect to available resource capabilities. Based on a nested execution model, we develop a multi-stage algorithm that uses constraint solving techniques in a parallel fashion thus achieving a more efficient recovery. We evaluate our proposed techniques in a prototype implementation, and demonstrate significant performance gains by using a parallel re-scheduling.

查看原文本刊更多论文

分布式系统中协作过程的灵活故障处理

分布式系统将越来越多地建立在无线网络之上，例如传感器网络或具有先进传感和计算能力的手持设备。支持由这种不可靠的动态系统组件执行的协作过程提出了许多新的技术挑战。在恢复方面，在重新调度失败的流程活动时要考虑有限的资源能力。在并发性方面，需要一个非阻塞协议来允许高度的并行性。本文针对分层分布式系统中的协同过程，提出了一种灵活的、面向资源的故障处理机制。目标是同时确保事务语义以及根据可用资源能力选择合适的节点。基于嵌套执行模型，我们开发了一种多阶段算法，该算法以并行方式使用约束求解技术，从而实现更有效的恢复。我们在原型实现中评估了我们提出的技术，并通过使用并行重新调度证明了显著的性能提升。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 5th International Conference on Collaborative Computing: Networking, Applications and Worksharing

自引率

0.00%

发文量