Workload Adaptive Checkpoint Scheduling of Virtual Machine Replication

2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing Pub Date : 2011-12-12 DOI:10.1109/PRDC.2011.32

Balazs Gerofi, Y. Ishikawa

{"title":"Workload Adaptive Checkpoint Scheduling of Virtual Machine Replication","authors":"Balazs Gerofi, Y. Ishikawa","doi":"10.1109/PRDC.2011.32","DOIUrl":null,"url":null,"abstract":"Checkpoint-recovery based Virtual Machine (VM) replication is an emerging approach towards accommodating VM installations with high availability, especially, due to its inherent capability of tackling with symmetric multiprocessing (SMP) virtual machines, i.e. VMs with multiple virtual CPUs (vCPUs). However, it comes with the price of significant performance degradation of the application executed in the VM because of the large amount of state that needs to be synchronized between the primary and the backup machines. Previous research improving VM replication performance focused primarily on decreasing the amount of data transferred over the network, while relying on constant checkpoint frequency. Our goal is to investigate how and to what extent performance degradation can be mitigated by adjusting the checkpoint period dynamically. We provide a comprehensive analysis of various workloads from the aspect of VM replication, paying special attention to their behavior over the increasing number of vCPUs in the system. We propose several heuristics for scheduling replication checkpoints in order to improve quality of service. Our algorithm adapts dynamically to the properties of the workload being executed in the VM, such as changes in the number of dirtied memory pages, network and disk I/O operations, as well as to the network bandwidth available for replication. We evaluate our scheduling algorithm over two network architectures, Gigabit Ethernet and Infiniband, a high-performance interconnect fabric. We find that checkpoint scheduling has a great impact on the performance of replicated virtual machines, and show that replicated virtual machines with up to 16 vCPUs can attain performance close to the native VM execution, not only over high-performance, but also over commercial network architectures.","PeriodicalId":254760,"journal":{"name":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","volume":"215 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PRDC.2011.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Checkpoint-recovery based Virtual Machine (VM) replication is an emerging approach towards accommodating VM installations with high availability, especially, due to its inherent capability of tackling with symmetric multiprocessing (SMP) virtual machines, i.e. VMs with multiple virtual CPUs (vCPUs). However, it comes with the price of significant performance degradation of the application executed in the VM because of the large amount of state that needs to be synchronized between the primary and the backup machines. Previous research improving VM replication performance focused primarily on decreasing the amount of data transferred over the network, while relying on constant checkpoint frequency. Our goal is to investigate how and to what extent performance degradation can be mitigated by adjusting the checkpoint period dynamically. We provide a comprehensive analysis of various workloads from the aspect of VM replication, paying special attention to their behavior over the increasing number of vCPUs in the system. We propose several heuristics for scheduling replication checkpoints in order to improve quality of service. Our algorithm adapts dynamically to the properties of the workload being executed in the VM, such as changes in the number of dirtied memory pages, network and disk I/O operations, as well as to the network bandwidth available for replication. We evaluate our scheduling algorithm over two network architectures, Gigabit Ethernet and Infiniband, a high-performance interconnect fabric. We find that checkpoint scheduling has a great impact on the performance of replicated virtual machines, and show that replicated virtual machines with up to 16 vCPUs can attain performance close to the native VM execution, not only over high-performance, but also over commercial network architectures.

查看原文本刊更多论文

虚拟机复制工作负载自适应检查点调度

基于检查点恢复的虚拟机(VM)复制是一种新兴的方法，用于适应具有高可用性的虚拟机安装，特别是由于其固有的处理对称多处理(SMP)虚拟机的能力，即具有多个虚拟cpu (vcpu)的虚拟机。但是，它的代价是在VM中执行的应用程序的性能显著下降，因为需要在主计算机和备份计算机之间同步大量的状态。以前提高VM复制性能的研究主要集中在减少通过网络传输的数据量，同时依赖于恒定的检查点频率。我们的目标是研究如何以及在多大程度上通过动态调整检查点周期来减轻性能下降。我们从VM复制的角度对各种工作负载进行了全面的分析，特别关注它们在系统中vcpu数量增加时的行为。为了提高服务质量，我们提出了几种调度复制检查点的启发式方法。我们的算法动态地适应在VM中执行的工作负载的属性，例如dirty内存页面、网络和磁盘I/O操作数量的变化，以及可用于复制的网络带宽。我们在两种网络架构上评估我们的调度算法，千兆以太网和Infiniband，一种高性能的互连结构。我们发现检查点调度对复制虚拟机的性能有很大的影响，并表明具有多达16个vcpu的复制虚拟机可以获得接近本机VM执行的性能，不仅在高性能方面，而且在商业网络架构上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 17th Pacific Rim International Symposium on Dependable Computing

自引率

0.00%

发文量