ReHype: enabling VM survival across hypervisor failures

International Conference on Virtual Execution Environments Pub Date : 2011-03-09 DOI:10.1145/1952682.1952692

Michael V. Le, Y. Tamir

{"title":"ReHype: enabling VM survival across hypervisor failures","authors":"Michael V. Le, Y. Tamir","doi":"10.1145/1952682.1952692","DOIUrl":null,"url":null,"abstract":"With existing virtualized systems, hypervisor failures lead to overall system failure and the loss of all the work in progress of virtual machines (VMs) running on the system. We introduce ReHype, a mechanism for recovery from hypervisor failures by booting a new instance of the hypervisor while preserving the state of running VMs. VMs are stalled during the hypervisor reboot and resume normal execution once the new hypervisor instance is running. Hypervisor failures can lead to arbitrary state corruption and inconsistencies throughout the system. ReHype deals with the challenge of protecting the recovered hypervisor instance from such corrupted state and resolving inconsistencies between different parts of hypervisor state as well as between the hypervisor and VMs and between the hypervisor and the hardware. We have implemented ReHype for the Xen hypervisor. The implementation was done incrementally, using results from fault injection experiments to identify the sources of dangerous state corruption and inconsistencies. The implementation of ReHype involved only 880 LOC added or modified in Xen. The memory space overhead of ReHype is only 2.1MB for a pristine copy of the hypervisor code and static data plus a small reserved memory area. The fault injection campaigns used to evaluate the effectiveness of ReHype involved a system with multiple VMs running I/O and hypercall-intensive benchmarks. Our experimental results show that the ReHype prototype can successfully recover from over 90% of detected hypervisor failures.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1952682.1952692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 36

Abstract

With existing virtualized systems, hypervisor failures lead to overall system failure and the loss of all the work in progress of virtual machines (VMs) running on the system. We introduce ReHype, a mechanism for recovery from hypervisor failures by booting a new instance of the hypervisor while preserving the state of running VMs. VMs are stalled during the hypervisor reboot and resume normal execution once the new hypervisor instance is running. Hypervisor failures can lead to arbitrary state corruption and inconsistencies throughout the system. ReHype deals with the challenge of protecting the recovered hypervisor instance from such corrupted state and resolving inconsistencies between different parts of hypervisor state as well as between the hypervisor and VMs and between the hypervisor and the hardware. We have implemented ReHype for the Xen hypervisor. The implementation was done incrementally, using results from fault injection experiments to identify the sources of dangerous state corruption and inconsistencies. The implementation of ReHype involved only 880 LOC added or modified in Xen. The memory space overhead of ReHype is only 2.1MB for a pristine copy of the hypervisor code and static data plus a small reserved memory area. The fault injection campaigns used to evaluate the effectiveness of ReHype involved a system with multiple VMs running I/O and hypercall-intensive benchmarks. Our experimental results show that the ReHype prototype can successfully recover from over 90% of detected hypervisor failures.

查看原文本刊更多论文

ReHype:允许虚拟机在虚拟机管理程序故障时存活

对于现有的虚拟化系统，管理程序故障会导致整个系统故障，并且丢失系统上运行的虚拟机正在进行的所有工作。我们引入ReHype，这是一种通过在保持运行中的虚拟机状态的同时启动虚拟机管理程序的新实例来从虚拟机管理程序故障中恢复的机制。虚拟机在hypervisor重启期间会停止运行，在新的hypervisor实例运行后恢复正常运行。管理程序故障可能导致整个系统的任意状态损坏和不一致。ReHype处理的挑战是保护已恢复的管理程序实例免受这种损坏状态的影响，并解决管理程序状态的不同部分之间、管理程序与vm之间以及管理程序与硬件之间的不一致。我们已经为Xen管理程序实现了ReHype。该实现是逐步完成的，使用故障注入实验的结果来识别危险状态损坏和不一致的来源。ReHype的实现只涉及在Xen中添加或修改的880个LOC。ReHype的内存空间开销仅为2.1MB，用于管理程序代码和静态数据的原始副本以及一个小的保留内存区域。用于评估ReHype有效性的故障注入活动涉及一个具有多个运行I/O和超调用密集型基准的vm的系统。我们的实验结果表明，ReHype原型可以成功地从90%以上检测到的hypervisor故障中恢复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Virtual Execution Environments

自引率

0.00%

发文量