面向网络物理系统和灾难管理的大规模分布式发布和订阅服务的服务恢复

2014 IEEE International Conference on Cyber-Physical Systems, Networks, and Applications Pub Date : 2014-08-25 DOI:10.1109/CPSNA.2014.27

C. Shih, Hsin-Yi Chen, Zi-You Yeh

{"title":"面向网络物理系统和灾难管理的大规模分布式发布和订阅服务的服务恢复","authors":"C. Shih, Hsin-Yi Chen, Zi-You Yeh","doi":"10.1109/CPSNA.2014.27","DOIUrl":null,"url":null,"abstract":"Information and communication technology (ICT) played a critical role in disaster management in last few decades. One example is the messaging service for disaster alerts, rescue workers, and victims. Many of these messaging services are developing based on existing messaging services and in ad hoc manner to meet the communication requirements in different disaster scenario. However, most, if not all, existing messaging services are designed under the assumption that the underlying network infrastructures are mostly reliable. Unfortunately, this assumption is not valid during and after disasters. In this work, we designed and implemented a service recovery framework, including a landmark-based/centralized algorithm and a distributed algorithm, for publish/subscribe messaging services for disaster management. The developed mechanisms recover a failed service without manual efforts. The centralized algorithm uses a landmark node to monitor the services and to recover the failed one, the distributed algorithm is a Paxos-based algorithm to compile a consistent recovery plan among nodes, monitoring the failed service. We evaluated the performance for these two mechanisms, and discussed the proper use scenario for these two mechanisms. The results show that the centralized algorithm should only be used in a service network having no concurrent failure within a local area network, the distributed algorithm are neither sensitive to concurrent failures nor the size of service networks.","PeriodicalId":254175,"journal":{"name":"2014 IEEE International Conference on Cyber-Physical Systems, Networks, and Applications","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Service Recovery for Large Scale Distributed Publish and Subscription Services for Cyber-Physical Systems and Disaster Management\",\"authors\":\"C. Shih, Hsin-Yi Chen, Zi-You Yeh\",\"doi\":\"10.1109/CPSNA.2014.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information and communication technology (ICT) played a critical role in disaster management in last few decades. One example is the messaging service for disaster alerts, rescue workers, and victims. Many of these messaging services are developing based on existing messaging services and in ad hoc manner to meet the communication requirements in different disaster scenario. However, most, if not all, existing messaging services are designed under the assumption that the underlying network infrastructures are mostly reliable. Unfortunately, this assumption is not valid during and after disasters. In this work, we designed and implemented a service recovery framework, including a landmark-based/centralized algorithm and a distributed algorithm, for publish/subscribe messaging services for disaster management. The developed mechanisms recover a failed service without manual efforts. The centralized algorithm uses a landmark node to monitor the services and to recover the failed one, the distributed algorithm is a Paxos-based algorithm to compile a consistent recovery plan among nodes, monitoring the failed service. We evaluated the performance for these two mechanisms, and discussed the proper use scenario for these two mechanisms. The results show that the centralized algorithm should only be used in a service network having no concurrent failure within a local area network, the distributed algorithm are neither sensitive to concurrent failures nor the size of service networks.\",\"PeriodicalId\":254175,\"journal\":{\"name\":\"2014 IEEE International Conference on Cyber-Physical Systems, Networks, and Applications\",\"volume\":\"109 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-08-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on Cyber-Physical Systems, Networks, and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CPSNA.2014.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on Cyber-Physical Systems, Networks, and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CPSNA.2014.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在过去的几十年里，信息和通信技术在灾害管理中发挥了至关重要的作用。一个例子是灾难警报、救援人员和受害者的消息传递服务。这些消息传递服务中有许多是在现有消息传递服务的基础上以特别的方式开发的，以满足不同灾难场景中的通信需求。然而，大多数(如果不是全部的话)现有的消息传递服务都是在底层网络基础设施基本可靠的假设下设计的。不幸的是，这种假设在灾难发生期间和之后是无效的。在这项工作中，我们设计并实现了一个服务恢复框架，包括一个基于里程碑/集中式算法和一个分布式算法，用于发布/订阅用于灾难管理的消息传递服务。所开发的机制无需手动操作即可恢复失败的服务。集中式算法采用里程碑式节点对业务进行监控，对故障节点进行恢复;分布式算法采用基于paxos的算法，在节点间编制一致的恢复计划，对故障的业务进行监控。我们评估了这两种机制的性能，并讨论了这两种机制的正确使用场景。结果表明，集中式算法只适用于局域网内无并发故障的业务网络，分布式算法对并发故障和业务网络规模不敏感。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Service Recovery for Large Scale Distributed Publish and Subscription Services for Cyber-Physical Systems and Disaster Management

Information and communication technology (ICT) played a critical role in disaster management in last few decades. One example is the messaging service for disaster alerts, rescue workers, and victims. Many of these messaging services are developing based on existing messaging services and in ad hoc manner to meet the communication requirements in different disaster scenario. However, most, if not all, existing messaging services are designed under the assumption that the underlying network infrastructures are mostly reliable. Unfortunately, this assumption is not valid during and after disasters. In this work, we designed and implemented a service recovery framework, including a landmark-based/centralized algorithm and a distributed algorithm, for publish/subscribe messaging services for disaster management. The developed mechanisms recover a failed service without manual efforts. The centralized algorithm uses a landmark node to monitor the services and to recover the failed one, the distributed algorithm is a Paxos-based algorithm to compile a consistent recovery plan among nodes, monitoring the failed service. We evaluated the performance for these two mechanisms, and discussed the proper use scenario for these two mechanisms. The results show that the centralized algorithm should only be used in a service network having no concurrent failure within a local area network, the distributed algorithm are neither sensitive to concurrent failures nor the size of service networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE International Conference on Cyber-Physical Systems, Networks, and Applications

自引率

0.00%

发文量