主动容错的及时虚拟机迁移

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops Pub Date : 2011-03-28 DOI:10.1109/ISORCW.2011.42

A. Polze, Peter Tröger, Felix Salfner

{"title":"主动容错的及时虚拟机迁移","authors":"A. Polze, Peter Tröger, Felix Salfner","doi":"10.1109/ISORCW.2011.42","DOIUrl":null,"url":null,"abstract":"Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.","PeriodicalId":126022,"journal":{"name":"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Timely Virtual Machine Migration for Pro-active Fault Tolerance\",\"authors\":\"A. Polze, Peter Tröger, Felix Salfner\",\"doi\":\"10.1109/ISORCW.2011.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.\",\"PeriodicalId\":126022,\"journal\":{\"name\":\"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISORCW.2011.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISORCW.2011.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

摘要

下一代处理器和内存技术将为应用扩展提供巨大的计算和内存容量。然而，这是有代价的:由于晶体管数量的增加和结构尺寸的缩小，未来服务器系统的整体系统可靠性将受到显著影响。这使得响应式容错方案不太适合在可靠性和时效性约束下的服务器应用程序。我们提出了一个以主动方式管理服务器系统可靠性的体系结构蓝图，以便在硬件故障率不断增加的情况下保持响应时间和可用性的服务水平承诺。我们引入了预期虚拟机迁移的概念，主动将计算从故障或可疑的机器移开。迁移决策基于不同系统级别的健康指标，这些指标被组合成一个全局概率可靠性度量。基于此度量，可以触发实时迁移技术，以便在系统出现故障之前将计算转移到正常的机器上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Timely Virtual Machine Migration for Pro-active Fault Tolerance

Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops

自引率

0.00%

发文量