{"title":"主动容错的及时虚拟机迁移","authors":"A. Polze, Peter Tröger, Felix Salfner","doi":"10.1109/ISORCW.2011.42","DOIUrl":null,"url":null,"abstract":"Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.","PeriodicalId":126022,"journal":{"name":"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":"{\"title\":\"Timely Virtual Machine Migration for Pro-active Fault Tolerance\",\"authors\":\"A. Polze, Peter Tröger, Felix Salfner\",\"doi\":\"10.1109/ISORCW.2011.42\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.\",\"PeriodicalId\":126022,\"journal\":{\"name\":\"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"35\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISORCW.2011.42\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 14th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISORCW.2011.42","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Timely Virtual Machine Migration for Pro-active Fault Tolerance
Next generation processor and memory technologies will provide tremendously increasing computing and memory capacities for application scaling. However, this comes at a price: Due to the growing number of transistors and shrinking structural sizes, overall system reliability of future server systems is about to suffer significantly. This makes reactive fault tolerance schemes less appropriate for server applications under reliability and timeliness constraints. We propose an architectural blueprint for managing server system dependability in a pro-active fashion, in order to keep service-level promises for response times and availability even with increasing hardware failure rates. We introduce the concept of anticipatory virtual machine migration that proactively moves computation away from faulty or suspicious machines. The migration decision is based on health indicators at various system levels that are combined into a global probabilistic reliability measure. Based on this measure, live migration techniques can be triggered in order to move computation to healthy machines even before a failure brings the system down.