Optimizing crash dump in virtualized environments

International Conference on Virtual Execution Environments Pub Date : 2010-03-17 DOI:10.1145/1735997.1736003

Yijian Huang, Haibo Chen, B. Zang

{"title":"Optimizing crash dump in virtualized environments","authors":"Yijian Huang, Haibo Chen, B. Zang","doi":"10.1145/1735997.1736003","DOIUrl":null,"url":null,"abstract":"Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems.\n In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM.\n We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.","PeriodicalId":202844,"journal":{"name":"International Conference on Virtual Execution Environments","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Virtual Execution Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1735997.1736003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems. In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM. We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.

查看原文本刊更多论文

优化虚拟环境中的崩溃转储

崩溃转储或核心转储是在系统崩溃时保存内存映像以供将来脱机调试和分析的典型方法。但是，对于可能具有丰富内存的典型服务器机器，核心转储的时间会延迟基于重新启动的恢复，从而显著增加平均修复时间(MTTR)，而不转储故障上下文进行分析可能会导致同一问题上的重复崩溃。在本文中，为了缩短合并虚拟机在崩溃期间的MTTR，我们提出了几种虚拟化环境中核心转储的优化技术。首先，我们通过在崩溃的VM和新生成的VM之间动态回收和分配内存，并行处理崩溃转储和重新启动崩溃VM的过程。其次，我们使用虚拟机管理层来内省崩溃虚拟机的关键数据结构，以过滤掉未使用的内存转储。最后，我们根据用户调整的速率控制策略，在核心转储和新生成的虚拟机之间实现磁盘I/O速率控制，以平衡崩溃转储的时间和恢复虚拟机的服务质量。我们已经实现了一个工作原型，Vicover，它在Xen中优化了虚拟机系统崩溃时的核心转储，以最小化核心转储和恢复的整体MTTR。在我们对虚拟TPC-W服务器的实验中，Vicover将崩溃转储导致的停机时间缩短了大约5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Virtual Execution Environments

自引率

0.00%

发文量