ReHype: enabling VM survival across hypervisor failures

Michael V. Le, Y. Tamir
{"title":"ReHype: enabling VM survival across hypervisor failures","authors":"Michael V. Le, Y. Tamir","doi":"10.1145/1952682.1952692","DOIUrl":null,"url":null,"abstract":"With existing virtualized systems, hypervisor failures lead to overall system failure and the loss of all the work in progress of virtual machines (VMs) running on the system. We introduce ReHype, a mechanism for recovery from hypervisor failures by booting a new instance of the hypervisor while preserving the state of running VMs. VMs are stalled during the hypervisor reboot and resume normal execution once the new hypervisor instance is running. Hypervisor failures can lead to arbitrary state corruption and inconsistencies throughout the system. ReHype deals with the challenge of protecting the recovered hypervisor instance from such corrupted state and resolving inconsistencies between different parts of hypervisor state as well as between the hypervisor and VMs and between the hypervisor and the hardware. We have implemented ReHype for the Xen hypervisor. The implementation was done incrementally, using results from fault injection experiments to identify the sources of dangerous state corruption and inconsistencies. The implementation of ReHype involved only 880 LOC added or modified in Xen. The memory space overhead of ReHype is only 2.1MB for a pristine copy of the hypervisor code and static data plus a small reserved memory area. The fault injection campaigns used to evaluate the effectiveness of ReHype involved a system with multiple VMs running I/O and hypercall-intensive benchmarks. Our experimental results show that the ReHype prototype can successfully recover from over 90% of detected hypervisor failures.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"36","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1952682.1952692","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 36

Abstract

With existing virtualized systems, hypervisor failures lead to overall system failure and the loss of all the work in progress of virtual machines (VMs) running on the system. We introduce ReHype, a mechanism for recovery from hypervisor failures by booting a new instance of the hypervisor while preserving the state of running VMs. VMs are stalled during the hypervisor reboot and resume normal execution once the new hypervisor instance is running. Hypervisor failures can lead to arbitrary state corruption and inconsistencies throughout the system. ReHype deals with the challenge of protecting the recovered hypervisor instance from such corrupted state and resolving inconsistencies between different parts of hypervisor state as well as between the hypervisor and VMs and between the hypervisor and the hardware. We have implemented ReHype for the Xen hypervisor. The implementation was done incrementally, using results from fault injection experiments to identify the sources of dangerous state corruption and inconsistencies. The implementation of ReHype involved only 880 LOC added or modified in Xen. The memory space overhead of ReHype is only 2.1MB for a pristine copy of the hypervisor code and static data plus a small reserved memory area. The fault injection campaigns used to evaluate the effectiveness of ReHype involved a system with multiple VMs running I/O and hypercall-intensive benchmarks. Our experimental results show that the ReHype prototype can successfully recover from over 90% of detected hypervisor failures.
ReHype:允许虚拟机在虚拟机管理程序故障时存活
对于现有的虚拟化系统,管理程序故障会导致整个系统故障,并且丢失系统上运行的虚拟机正在进行的所有工作。我们引入ReHype,这是一种通过在保持运行中的虚拟机状态的同时启动虚拟机管理程序的新实例来从虚拟机管理程序故障中恢复的机制。虚拟机在hypervisor重启期间会停止运行,在新的hypervisor实例运行后恢复正常运行。管理程序故障可能导致整个系统的任意状态损坏和不一致。ReHype处理的挑战是保护已恢复的管理程序实例免受这种损坏状态的影响,并解决管理程序状态的不同部分之间、管理程序与vm之间以及管理程序与硬件之间的不一致。我们已经为Xen管理程序实现了ReHype。该实现是逐步完成的,使用故障注入实验的结果来识别危险状态损坏和不一致的来源。ReHype的实现只涉及在Xen中添加或修改的880个LOC。ReHype的内存空间开销仅为2.1MB,用于管理程序代码和静态数据的原始副本以及一个小的保留内存区域。用于评估ReHype有效性的故障注入活动涉及一个具有多个运行I/O和超调用密集型基准的vm的系统。我们的实验结果表明,ReHype原型可以成功地从90%以上检测到的hypervisor故障中恢复。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信