容灾计算的组成与分析

2007 IEEE International Performance, Computing, and Communications Conference Pub Date : 2007-04-11 DOI:10.1109/PCCC.2007.358917

Chad M. Lawler, Michael A. Harper, Mitchell A. Thornton

{"title":"容灾计算的组成与分析","authors":"Chad M. Lawler, Michael A. Harper, Mitchell A. Thornton","doi":"10.1109/PCCC.2007.358917","DOIUrl":null,"url":null,"abstract":"This paper provides a review of the components of disaster tolerant computing and communications and reviews the current state in light of recent man-made terrorist events. The paper examines the relationships between disaster tolerant systems, information technology (IT) application availability and executive level management visibility necessary for successful system operations in the event of a catastrophic disaster; one which causes rapid, almost simultaneous, multiple points of failure in a system, as well as a single points of failure that escalate into wide catastrophic system failures. The technology, process and human resource challenges of traditional disaster recovery approaches to disaster preparedness are outlined. The risks of IT application downtime attributable to the increasing dependence on critical information technology applications operating in distributed and unbounded networks are explored. A general method for disaster tolerance is proposed which mitigates unplanned downtime through a disciplined approach of IT infrastructure design based on redundancy and distributed components with special attention given to the ability of executive level management to comprehend the value of uptime of an application and make appropriate capital investment. The importance of executive visibility into the system wide impact of downtime and the resultant effects on the costs of downtime of critical systems is explored.","PeriodicalId":356565,"journal":{"name":"2007 IEEE International Performance, Computing, and Communications Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Components and Analysis of Disaster Tolerant Computing\",\"authors\":\"Chad M. Lawler, Michael A. Harper, Mitchell A. Thornton\",\"doi\":\"10.1109/PCCC.2007.358917\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper provides a review of the components of disaster tolerant computing and communications and reviews the current state in light of recent man-made terrorist events. The paper examines the relationships between disaster tolerant systems, information technology (IT) application availability and executive level management visibility necessary for successful system operations in the event of a catastrophic disaster; one which causes rapid, almost simultaneous, multiple points of failure in a system, as well as a single points of failure that escalate into wide catastrophic system failures. The technology, process and human resource challenges of traditional disaster recovery approaches to disaster preparedness are outlined. The risks of IT application downtime attributable to the increasing dependence on critical information technology applications operating in distributed and unbounded networks are explored. A general method for disaster tolerance is proposed which mitigates unplanned downtime through a disciplined approach of IT infrastructure design based on redundancy and distributed components with special attention given to the ability of executive level management to comprehend the value of uptime of an application and make appropriate capital investment. The importance of executive visibility into the system wide impact of downtime and the resultant effects on the costs of downtime of critical systems is explored.\",\"PeriodicalId\":356565,\"journal\":{\"name\":\"2007 IEEE International Performance, Computing, and Communications Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 IEEE International Performance, Computing, and Communications Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PCCC.2007.358917\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE International Performance, Computing, and Communications Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PCCC.2007.358917","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

本文回顾了容灾计算和通信的组成部分，并根据最近的人为恐怖事件回顾了目前的状态。本文研究了在发生灾难性灾难时，成功的系统操作所必需的容灾系统、信息技术(IT)应用程序可用性和执行层管理可见性之间的关系;一种导致系统中快速，几乎同时的多点故障，以及单点故障升级为广泛的灾难性系统故障。概述了传统灾难恢复方法在备灾方面的技术、流程和人力资源挑战。IT应用程序停机的风险归因于日益依赖的关键信息技术应用程序运行在分布式和无界的网络进行了探讨。提出了一种通用的容灾方法，该方法通过基于冗余和分布式组件的IT基础设施设计的规范方法来减少计划外停机时间，并特别注意执行层管理理解应用程序正常运行时间价值的能力，并进行适当的资本投资。管理层对停机对整个系统的影响以及由此产生的对关键系统停机成本的影响的可见性的重要性进行了探讨。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Components and Analysis of Disaster Tolerant Computing

This paper provides a review of the components of disaster tolerant computing and communications and reviews the current state in light of recent man-made terrorist events. The paper examines the relationships between disaster tolerant systems, information technology (IT) application availability and executive level management visibility necessary for successful system operations in the event of a catastrophic disaster; one which causes rapid, almost simultaneous, multiple points of failure in a system, as well as a single points of failure that escalate into wide catastrophic system failures. The technology, process and human resource challenges of traditional disaster recovery approaches to disaster preparedness are outlined. The risks of IT application downtime attributable to the increasing dependence on critical information technology applications operating in distributed and unbounded networks are explored. A general method for disaster tolerance is proposed which mitigates unplanned downtime through a disciplined approach of IT infrastructure design based on redundancy and distributed components with special attention given to the ability of executive level management to comprehend the value of uptime of an application and make appropriate capital investment. The importance of executive visibility into the system wide impact of downtime and the resultant effects on the costs of downtime of critical systems is explored.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 IEEE International Performance, Computing, and Communications Conference

自引率

0.00%

发文量